
Reputation
Badges 1
25 × Eureka!BTW: the new pipeline decorator interface example is here:
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
Also what do you have in the "Configuration" section of the serving inference Task?
CleanWhale17 nice ... π
So the answer is Trains supports the Pipeline / Automation of it, but lacks that dataset integration (that is basically up to you to manage, with either artifacts or any other method)
The Allegro Enterprise allows you to rerun the code, on a new version of the dataset from the UI (or automation) without changing a single line of code π
Glue machine or K8S Worker machine?
The K8s worker machine.
You could also configure an ingest service as part of the template, so they always have an external port mapped into the port.
Hi DeliciousBluewhale87
Hmm, good question.
Basically the idea is that if you have ingestion service on the pods (i.e. as part of the yaml template used by the k8s glue) you can specify to the glue what are the exposed ports, so it knows (1) what's the maximum of instances it can spin, e.g. one per port (2) it will set the external port number on the Task, so that the running agent/code will be aware of the exposed port.
A use case for it would be combing the clearml-session with the k8s gl...
Hi @<1668065560107159552:profile|VivaciousPenguin20>
I think you are looking at the wrong experiment, this is a 3 year old experiment ? this does not seem to be your currently executed experiment, right?
I guess last followup question, is there a way to cap costs?
Scale tier ? (I know it is not per usage, but it is probably more than 15$ per user π )
HungryArcticwolf62 the new clearml-serving is almost out (eta late next week), you can already start playing here:
https://github.com/allegroai/clearml-serving/tree/dev
Example:
train+serve
https://github.com/allegroai/clearml-serving/tree/dev/examples/sklearn
could you send the entire log here?
i.e. from the "docker-compose" command line and onward
As we canβt create keys in our AWS due to infosec requirements
Hmmm
You mean to add these two to the model when deploying?
β βββ model_NVIDIA_GeForce_RTX_3080.plan
β βββ model_Tesla_T4.plan
Notice the preprocess.py
is Not running on the GPU instance, it is running on a CPU instance (technically not the same machine)
Hi WickedStarfish97
As a result, I donβt want the Agent to parse what imports are being used / install dependencies whatsoever
Nothing to worry about here, even if the agent detects the python packages, they are installed on top of the preexisting packages inside the docker. That said if you want to over ride it, you can also pass packages=[]
I'm getting:hydra_core == 1.1.1
What's the setup you have? python version, OS, Conda yes/no?
Follow-up question: how does clearML "inject" the argparse arguments before the task is initialized?
it patches the actual parse_args
call, to make sure it works you just need to make sure it was imported before the actual call takes place
I had to do another workaround since when
torch.distributed.run
called it's
ArgumentParser
, it was getting the arguments from my script (and from my task) instead of the ones I passed it
Are you saying...
Hi @<1630377234361487360:profile|RoughSeaturtle43>
code from gitlab repo with ssl cert.
what do you mean by ssl secret? is it SSH or app-token ?
Hi SubstantialElk6ClearML-Data
doesn't actually "load" the data, it brings it locally and returns a folder with all your data files, from that point onward, it's up to your code to load it to the framework. Make sense ?
@<1523710674990010368:profile|GreasyPenguin14> make sure it to uses https not ssh:
edit ~/clearml.conf
force_git_ssh_protocol: false
and that you have both git_user & git_pass set in your clearml.conf
I have a process that cleans theΒ
/tmp
Β each day,
WackyRabbit7 the files (configuration etc.) that are mapped into the containers are stored there.
They should clean themselves, that said, we have noticed that the services-mode skips this cleanup, and it will be solved on the next RC of clearml-agent.
Make sense ?
So how do I solve the problem? Should I just relaunch the agents? Because they can't execute jobs now
Are you running in docker mode ?
If so you can actually delete mapped files (they will still be available inside the docker), just make sure you delete them X hours after they were created, and you should be fine.
wdyt?
Set it on the PID of the agent process itself (i.e. the clearml-agent python process)
if they're mission critical, but rather the clearml cache folder?
hmmm... they are important, but only when starting the process. any specific suggestion ?
(and they are deleted after the Task is done, so they are temp)
using caching where specified but the pipeline page doesn't show anything at all.
What do you mean by " the pipeline page doesn't show anything at all."? are you running the pipeline ? how ?
Notice PipelineDecorator.component needs to be Top level not nested inside the pipeline logic, like in the original example
@PipelineDecorator.component(
cache=True,
name=f'append_string_{x}',
)
Hi SteadySeagull18
What does the intended workflow for making a "pipeline from tasks" look like?
The idea is if you have existing Tasks in the system and you want to launch them one after the other with control over inputs (or outputs of them) you can do that, without writing any custom code.
Currently, I have a script which does some
Task.create
's,
Notice that your script should do Task.init - Not Task.create, as Task create is designed to create additional ...
DeliciousBluewhale87 basically any solution that is compliant with S3 protocol will work. An example:output_uri="
:PORT/bucket/folder"
Are you sure Nexus supports this protocol ?
I "think" nexus sits on top of a storage solution (like am object storage), meaning we can use the same storage solution Nexus is using.
Just to clarify we do not support the artifactory protocol Nexus provides for storing models/artifacts. But we do support it as a source for python packages used by the a...
ShaggyHare67 are you saying the problem is trains
fails discovering the packages in the manual execution ?
Hi ItchyJellyfish73
This seems aligned with scenario you are describing, it seems the api server is overloaded with simultaneous connections.
Add an additional apiserver instance to the docker-compose and an nginx as load balancer:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L4
`
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-sto...