
Reputation
Badges 1
25 × Eureka!... indicate the job needs to be run remotely? Iām imagining something like
clearml-task
and you need to specify the queue to push your Task into.
See here: https://clear.ml/docs/latest/docs/apps/clearml_task
I want to use services queue for running services, and I want to do it on k8s
So yes, as a standalone pod with the agent in venv mode (as opposed to docker mode)
Does that make sense to you?
Hi TroubledJellyfish71
What do you have listed on the Task's execution "installed packages" section ? (of the original Task) ?
How did it end up with an http link of pytorch ?
Usually it would be torch==1.11
...
EDIT:
I'm assuming the original Task was executed on a Mac M1, what are you getting when calling pip freeze
?
And where is the agent running ? (and is it venv or docker mode?)
I see, actually what you should do is a fully custom endpoint,
- preprocessing -> doenload video
- processing -> extract frames and send them to Triton with gRPC (see below how)
- post processing, return a human readable answer
Regrading the processing itself, what you need is to take this function (copy paste):
None
have it as internal `_process...
Okay that means it is running in virtual environment mode.
On the original Task (the one you enqueued) what were the installed packages (specifically the torch/torchvision) ?
Nested in the UI is not possible I think?
Yes, but the next version will have nested projects, that's something š
I mean that it is possible to start the subtask while the main task is still active.
You cannot call another Task.init while a main one is running.
But you can call Task.create and log into it, that said the autologging is not supported on the newly created Task.
Maybe the easiest solution is just to do the "sub-tasks" and close them. That means the main Task i...
Thanks BitterStarfish58 !
It doesn't not seem to be related to the upload. The upload itself finished... What's your Trains version?
How so? they are in one place? the creation of the venv is transparent, and the packages that are there are everything you have in the docker, plus the ability to override them from the UI.
What am I missing here ?
GrievingTurkey78 yes, you are correct on both.
Will the sweep functionality work?
Yes it should, that said, it will not use the trains-agent
so you are limited to the machine running the sweep.
If you want to do HPO on multi-node, checkout this example š
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
Yes the clearml-server AMI - we want to be able to back it up and encrypt it on our account
I think the easiest and safest way for you is to actually have full control over the AMI, and recreate once from scratch.
Basically any ubuntu/centos + docker and docker-compose should do the trick, wdyt ?
GiganticTurtle0 fix was just pushed to GitHub špip install git+
SoggyFrog26 you'll have it in the next RC š
Not sure what's the plan I know one should be out today/tomorrow, worst case on the next one š
HurtWoodpecker30
The agent uses the
requirements.txt
)
what do you mean by that? aren't the package listed in the "Installed packages" section of the Task?
(or is it empty when starting, i.e. it uses the requirements.txt from the github, and then the agent lists them back into the Task)
Sure:Dataset.create(..., use_current_task=True)
This will basically attach/make the main Task the Dataset itself (Dataset is a type of a Task, with logic built on top of it)
wdyt ?
Sure, thing, I'll fix the "create_draft" docstring to suggest it
Yes, I was referring to logging the "clearlm-data" Dataset ID on the Task itself, not an external database.
Make sense?
Let me check, it was supposed to be automatically aborted
Full markdown edit on the project so you can create your own reports and share them (you can also put links to the experiments themselves inside the markdown). Notice this is not per experiment reporting (we kind of assumed maintaining a per experiment report is not realistic)
With k8s glue going, want to finally look at clearml-session and how people are using it.
If used with k8s glue, you will have to run the glue with --ports-mode, then the clearml session will know how to connect to container itself, since at runtime the container will register the gateway + port for the learml-session client to connect to
Hi SlimyElephant79
As you can imagine, wandb's tracking code would be present across the code modules and I was hoping for a structured approach that would help me transition to ClearMLs experiment tracking.
Do you guys a have a layer in between that does the reporting, or is the codebase riddled with direct reporting calls ? if the latter, then I guess search and replace ? or maybe a module that "converts" wandb call to clearml call ? wdyt?
Exactly! nice š
Scheduled training is what Iām looking forward to
I'll try to remember to update here once we pushed into the GitHub repo, feedback is always appropriated š
If in the next two weeks you hear nothing, please ping here to make sure I did not forget š
Hi SourSwallow36
- The same docker image is used for all three jobs, just because it is easier to manage and faster to download. The full code is available on the trains-server GitHub. If you want to spin the containers manually, check the docker-compose.yml on the main repo, it has all the commands there
- Fork the trains-server, commit the changes and don't forget to PR them ;)
- Elastic search is a database, we use it to log all the experiments outputs, console logs metrics etc. This...
JitteryCoyote63 I think this only holds for the conda distribution.
(Actually quite interesting, I wonder what happens if you already installed cudatoolkit...)