Reputation
Badges 1
25 × Eureka!so I assume clearml moves them from one queue to the other?
Correct. When it creates the k8s job and launches it on the cluster it moves it into the queue.
Can you see it on your k8s cluster (meaning the job/pod)?
Is there any references (vlog/blog) on deploying real-time model and do the continuous training pipeline in clear-ml?
Something along the lines of this one ?
https://clear.ml/blog/creating-a-fully-automatic-retraining-loop-using-clearml-data/
Or this one?
https://www.youtube.com/watch?v=uNB6FKIi8Wg
So it seems to get the "hint" from the type:
This will worktf.summary.image('toy255', (ex * 255).astype(np.uint8), step=step, max_outputs=10)wdyt, should it actually check min/max and manually cast it ?
Hi ScaryKoala63
Which versions are you using (clearml / lightning) ?
Hi GrievingTurkey78
First, I would look at the CLI clearml-data as a baseline for implementing such a tool:
Docs:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md
Implementation :
https://github.com/allegroai/clearml/blob/master/clearml/cli/data/main.py
Regrading your questions:
(1) No, a new dataset version will only store the diff from the parent (if files are removed it stored the metadata that says the file was removed)
(2) Yes any get operation will downl...
Thanks CooperativeFox72 ! I'll test and keep you posted 🙂
UpsetBlackbird87pipeline.start()Will launch the pipeline itself On a remote machine (a machine running the services agent).
This is why your pipeline is "stuck" it is not actually running.
When you call start_lcoally() the pipeline logic itself is runnign on your machine and the nodes are running on the workers.
Makes sense ?
As I suspected, from your log:agent.package_manager.system_site_packages = falseWhich is exactly the problem of the missing tensorflow (basically it creates a new venv inside the docker, but without the flag On, it does not inherit the docker preinstalled packages)
This flag should have been true.
Could it be that the clearml.conf you are providing for the glue includes this value?
(basically you should only have the sections that are either credentials or missing from the default, there...
query
tasks
that are both Running --> You mean
status=["in_progress"]
Yes!
How do I figure out other possible parameter I can use with
status
parameter?
https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all
https://clear.ml/docs/latest/docs/references/api/definitions#taskstask
Filter only tasks that start say
10 min ago
. Is there any parameter for it also ?
last_update or created then use...
Hmm, I think it is this line:
WARNING - Model configuration only supports dictionary or string objects
done
Let me check something.
` @PipelineDecorator.component(
name="my step", return_values=['data_frame'], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_url: str, extra: int = 43):
stuff here `This seemed to work for me
RobustRat47 what's the Triton container you are using ?
BTW, the Triton error is:model_repository_manager.cc:1152] failed to load 'test_model_pytorch' version 1: Internal: unable to create stream: the provided PTX was compiled with an unsupported toolchain.https://github.com/triton-inference-server/server/issues/3877
and of course:task.set_parameters_as_dict(params)
We might need to change the default base docker image, but I remember it was there... Let me check again
Hi IntriguedRat44
You can make log it offline (i.e. into a local folder/zip) by calling:Task.set_offline(True)You can also set the environment variable:TRAINS_OFFLINE_MODE=1You could also just skip the Trains.init call 😉
Does that help?
Trains is fully open-source, that said properly publishing and maintaining the web client is still on our to do list (I mean there is totally readable JavaScript code packaged in the trains-server and the dockers). It is constantly pushed because there is generally less contributions on the front-end with these kind of projects. That said of you guys are willing to help, it will greatly help in pushing it forward... LivelyLion31 what do you think, would you guys like to help with the fronte...
I guess. or pipelines that you can compose after running experiments to see that experiments are connected to each other
hmm what do you mean by "compose after running experiments" ? like a way to group them? what is the relation between one "item" to another ?
If this is a sequence of Tasks , are they executed by a controller ?
Actually it would be interesting to combine the two, feast is fully open-source and supported by the linux foundation, so I cannot see the harm in that.
wdyt?
We are working hard on release 1.7 once that is out we will push an RC for review (I hope) 🙂
Sorry, what I meant is that it is not documented anywhere that the agent should run in docker mode, hence my confusion
This is a good point! I'll make sure we stress it (BTW: it will work with elevated credentials, but probably not recommended)
I'm glad to hear 🙂
If you can reproduce it, let me know
MoodyCentipede68 seems you did not pass any configuration (os env or conf file) so it does nor know how to find the server and authenticate. Make sense?
that's the entire repo link ? not something like https://github.com/ ... ?
PompousParrot44 , so you mean like a base conda env?
Configuring trains-agent to use conda is done here:
https://github.com/allegroai/trains-agent/blob/699d13bbb34649c7e5337b4187cda59b7fa6fd33/docs/trains.conf#L44
Then for every experiment trains-agent will create a new conda environment based on the requirements of that experiment.
You can tell it to inherit the base conda env (or the one it is running from, I think) by settingsystem_site_packages: truehttps://github.com/allegroai/tr...
. I'm trying to run to get a task to run using a specific docker image and to source a bash script before execution of the python script.
Are you running an agent in docker mode ? if so you should be able to see the Output of your bash script first thing in the log
(and it will appear in the docker CMD)
SoreDragonfly16 could you reproduce the issue?
What's your OS? trains versions?
GiddyTurkey39
BTW: you can always add the missing package via code:Task.add_requirements('torch', optional_version)