IdealPanda97 Hmm I see...
Well, unfortunately, Trains is all about free access to all ๐
That said, the Enterprise edition does add permissions and data management on top of Trains. You can get in touch through the https://allegro.ai/enterprise/#contact , I'm sure someone will get back to you soon.
Hi @<1523701168822292480:profile|ExuberantBat52>
I am trying to execute a pipeline remotely,
How are you creating your pipeline? and are you referring to an issue with the pipeline logic or is it a component that needs that repo installed ?
Wait, is "SSH_AUTH_SOCK" defined on the host? it should auto mount the SSH folder as well?!
DisgustedDove53 , TrickySheep9
I'm all for it!
I can think of two options here, (1) use the k8s glue + apply template with ports mode see discussion https://clearml.slack.com/archives/CTK20V944/p1628091020175100
(2) create an interface (queue) to launch arbitrary job on the k8s cluster, with the full pod definition on the Task. This will allow the clearml-session to setup everything from the get go.
How would you interface with the k8s operator, and what exactly will it do?
(BTW: the reas...
WhimsicalLion91 I guess import/export is going to be more challenging, doable though. You will need to get all the Tasks, then collect all the artifacts, then collect all the reported logs (console/plots/etc). Then import everything back to your own server...
Exporting a single Tasktask.export_task
and Task.import_task
If you need all the scalars :task.get_reported_scalars(...)
And the console logs:Task.get_reported_console_output
Is there any way to debug these sessions through clearml? Thanks!
Yes this is a real problem, AWS does not allow to get the data very easily...
Can you check the AWS console, see what you have there ?
In theory this should have worked.
Maybe we you are missing some escaping for the "extra_vm_bash_script" ?
I'm hoping the console output will tell us
Exactly!
Regarding adding feature store, probably not in the near future, a scalable feature store is quite the project, probably more realistic to somehow have a recipe to deploy with Feast
ShallowGoldfish8 I believe it was solved in 1.9.0, can you verify?pip install clearml==1.9.0
If i were to push the private package to, say artifactory, is it possible to use that do the install?
Yes that's the recommended way ๐
You add the private repo here, for the agent to use:
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L65
ok, but this happens in my local machine, not in the agent
resource monitoring is always running in the background, even on local machines. (of course you can turn it off)
If this is the case:dataset = Dataset.get(...) dataset.get_dependency_graph()
https://clear.ml/docs/latest/docs/references/sdk/dataset#get_dependency_graph
Wait, with the Port it does not work?
Notice that since this is external S3 you have to have the port specified so it Knows this is not an AWS S3 but a different compatible service
Hi FloppyDeer99
What is the meaning of no real scheduling
I think the meaning is that from the moment a k8s job is created, the k8s is in charge of actually spinning the container. Since k8s has no real priority/order the scheduling order is not guaranteed form this point.
The idea of the cleaml-k8s -glue is that the glue will launch a job on the k8s cluster only if it is sure there are enough resources to actually spin the job now (as opposed to, sometime in the future), this mea...
I cannot reproduce, tested with the same matplotlib version and python against the community server
Hi @<1559711593736966144:profile|SoggyCow20>
I would first like to say how amazing clearml is!
Thank you! ๐
Running in Docker mode (v19.03 and above) - using default docker image: nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
yes sdk.agent.default_docker.image = python:3.10.0-alpine
should beagent.default_docker.image = python:3.10.0-alpine
Notice the scope is agent, not sdk
GiganticTurtle0
If there are several tasks running concurrently, which task shouldย
Task.current_task()
ย return?ย (
How could you have that ?
Per process, there is one Main current Task (until you close it).
Are you referring to a pipeline with multiple steps ?
If this is the case, task.current_task
will return the Task of the component (if executed form the component) and the pipeline (if called from the pipeline logic function).
Notice we added the ability to s...
Hi ShinyRabbit94
system_site_packages: true
This is set automatically when running in "docker mode" no need to worry ๐
What is exactly the error you are getting ?
Could it be the container itself has the python packages installed in a venv not as "system packages" ?
2021-07-11 19:17:32,822 - clearml.Task - INFO - Waiting to finish uploads
I'm assuming a very large uncommitted changes ๐
Hi @<1532532498972545024:profile|LittleReindeer37>
Does Hydra support notebooks ? If it does, can you point to an exapmle?
Hi @<1523702969063706624:profile|PoisedShark13>
However, INSTALLED PACKAGES of my task is misses many of installed packages (any idea why?)
It automatically detects the directly imported packages, literally analyzing your code base and looking for imports
The derivative packages (i.e. the one that any of the "main" packages need, will be listed after the first time the agent installs everything)
If something specific is missing, you can manually add it with:
Task.add_requiremen...
Hi ScaryKoala63
Which versions are you using (clearml / lightning) ?
I don't see any requests
This points to configuration, specifically maybe it is directed to a different server?!
Hi JitteryCoyote63 a few implementation details on the services-mode, because I'm not certain I understand the issue.
The docker-agent (running in services mode) will pick a Task from the services queue, then it will setup the docker for it spin it and make sure the Task starts running inside the docker (once it is running inside the docker you will see the service Task registered as additional node in the system, until the Task ends) once that happens the trains-agent will try to fetch the...
oh the pipeline logic itself holds one "job" on the worker, and this is why you do not have any other spare workers to run the components of the pipeline.
Run your worker with --services-mode
, it will launch multiple Tasks at the same time, it should solve the issue
DepressedChimpanzee34
What's the hydra version ?
I tested with 1.1.0dev3 and it worked for me
Hmm yes, that is a good point, maybe we should allow to specify a parameter on the model configuration to help with the actual type ...