Reputation
Badges 1
25 × Eureka!Hi CharmingBeetle38
On the base task, do you see those arguments under the Configuration tab?
Also, if they are under Args section, you should add "Args/" prefix to the HP optimization (this is how you differentiate between the sections)
CharmingBeetle38 try adding "General/" before the arguments. This means batch_size becomes General/batch_size. This is only because we are accessing the parameters externally, when the task is executed it is resolved automatically
YummyMoth34
It tried to upload all events and then killed the experiment
Could you send a log?
Also, what's the train package version ?
No, it is zipped and stored, so in order to open the zipfile and read the files you have to download them.
That said everything is cached, so if the machine already downloaded the dataset there is zero download / unzipping,
make sese?
Can you see the repo itself ? the commit id ?
Ohh I see.
In your web app, look for the "?" icon (bottom left corner), click on it, it should open the full platform documentation
MelancholyElk85 notice there is the pipeline controller queue (i.e. which agent will run the logic of the pipeline), and the default queue for the pipeline steps (i.e. the actual steps of the pipeline).
The default queue for the pipeline logic itself is services
. you can change it ( pipeline.start(..., queue='another_q')
)
Make sense ?
the question remains though: why docker containers won't launch onΒ
services
Maybe something with the way it launched on the docker-compose?
(I'm assuming it will fail on any docker container regardless, right?!)
Yeah.. that should have worked ...
What's the exact error you are getting ?
We should probably add (set_task_type :))
So are you saying the large file size download is the issue ? (i.e. network issues)
Also, for a single parameter you can use:cloned_task.set_parameter(name="Args/artifact_name", value="test-artifact", description="my help text that will appear in the UI next to the value")
This way, you are not overwriting the other parameters, you are adding to them.
(Similar to update_parameters
, only for a single parameter)
Hi @<1695969549783928832:profile|ObedientTurkey46>
Use --services-mode in the agent , it will run many Tasks on the same machine, this is usually associated with the services queue, but can be run on any queue. This way you could have the same machine easily running those multiple "control" tasks.
wdyt?
So I wonder - why should an agent be related to a specific user's credentials? Is the right way to go about this is to create a "fake user" for the sake of the agent?
Very true you have to have credentials for the trains-agent, so it can "report" to the trains-server, that said, the creator of the Task (i.e. the person who cloned it) will be registered as the "user" in the UI.
I would recommend to create an "agent" user and put it's credentials on the trains-agent machine (the same way...
SmilingFrog76 this is not a weird mechanism at all , this is proper HPC scheduler πtrains-agent
is not actually aware of other nodes, it is responsible for launching a Task on its own hardware (with whatever configuration it was set). What can be done is to use the trains-agent
inside a 3rd party scheduler and have the scheduler allocate the node and trains-agent spin the experiment. There is a k8s example here: basically pulling jobs for the trains-server queue and pushing ...
For example, the
Task
object is heavily overloaded and its documentation would benefit from being separated into logical units of work. It would also make it easier for the ClearML team to spot any formatting issues.
This is a very good point (the current documentation is basically docstring, but we should create a structured one)
... but some visualization/inline code with explanation is also very much welcome.
I'm assuming this connected with the previous po...
DeterminedToad86 I suspect that since it was executed on sagemaker it registered a specific package that is unique for Sagemaker (no to worry installed packages can be edited after you clone/reset the Task)
Apparently it ignores it and replaces everything...
I see, let me check something π
Let me know if there is an issue π
Hi JitteryCoyote63
I think this is the default python str() casting.
But you can specify the preview test when you call upload_artifact:
https://clear.ml/docs/latest/docs/references/sdk/task#upload_artifact
see preview
argument
JitteryCoyote63 Great to hear π
BTW:
Would it be possible to extendΒ
Task.init
Β with aΒ
force_reuse
Β that would enforce reusing these tasks
You can pass continue_last_task=True
I think it should be equivalent to what you suggest