Hmm apparently it is not passed, but it could be.
Would the object itslef be enough to get the values? wouldn't it make sense to get them from outside somehow? (I'm assuming there is one set of args used at any certain moment?)
I can't find out how to pass my custom clearml.conf
Hi @<1544491301435609088:profile|TeenyElk27>
The easiest is to map it into the container in your docker-compose
(map a host clearml.conf into /root/clearml.conf inside the container)
OutrageousGrasshopper93 is "--gpus all" working ?
I think this is the discussion you are after:
https://clearml.slack.com/archives/C01H5VAUZ8R/p1612452197004900?thread_ts=1612273112.002400&cid=C01H5VAUZ8R
Hi @<1561885941545570304:profile|PunyKangaroo87>
What do mean by store data locally?
Like clearml-data? I.e Dataset?
You can always use file:///root/path/folder as destination, this will store everything into the local folder, is that it?
Hi DepressedFish57
In my case download each part takes ~5 second, and unzip ~15.
We run into that, and the new version will employ multithreading approach for the unzip (meaning the unzipping will happen in the background)
Yes, actually the first step would be a toggle button for regexp in the search, the second will be even more advanced search.
May I suggest you post it on the UI suggestion issue https://github.com/allegroai/trains/issues/81 ?
The first pipeline
step is calling init
GiddyPeacock64 Is this enough to track all the steps?
I guess my main question is every step in the pipeline an actual Task/Job or is it a single small function?
Kubeflow is great for simple DAGs but when you need to build more complex logic it is usually a bit limited
(for example the visibility into what's going on inside each step is missing so you cannot make a decision based on that).
WDYT?
Hi RattySeagull0
I'm trying to execute trains-agent in docker mode with conda as package manager, is it supported?
It should, that said we really do not recommend using conda as package manager (it is a lot slower than pip, and can create an environment that will be very hard to reproduce due to internal "compatibility matrix" of conda, that might be changing from one conda version to another)
"trains_agent: ERROR: ERROR: package manager "conda" selected, but 'conda' executable...
for future reference this is indeed a PEP-610 related bug, f
👍
can we also set the
poetry
version used?....
Actually the agent assumes poetry is preinstalled (so whatever you already have on the docker) ...
That said, maybe we should install a specific version (after installing pip, we could do that if poetry is selected)
wdyt ?
Not intentional! When I launched the AMI it was running an older version
I think this is exactly the reason they decided to change the location 🙂 so you will have to manually upgrade, reasoning is we changed directory names (maybe a few more things)
Yes shutdown the current docker copse curl the new docker compose rename folder spin it up againFull instructions here:
https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_aws_ec2_ami.html#upgrading
Hi, Is there a way to stop a clearml-agent from within an experiment?
It is possible but only in the paid tier (it needs backend support for that) 😞
My use case it: in a spot instance marked for termination after 2 mins by aws
Basically what you are saying is you want the instance to spin down after the job is completed, correct?
NastyOtter17
Usually the first report will happen after 30 seconds, could that be the difference ?
Hi @<1524922424720625664:profile|TartLeopard58>
- Opened container ports for VS Code, JupyterLab, and SSH.I think that by default it uses the host network so it can take care of that, are you saying you added k8s integration ?
- Added NodePort to the service to directly access via public IP:NodePort (previously only SSH was available, but now NodePort is added for VS Code and JupyterLab as well), allowing direct access without SSH tunneling.Interesting!
- Considering security vulnerabilitie...
Thanks MuddyCrab47 !!!
I found it!
It turns out the artifact upload will always upload from stream (aka no multi-upload). I will make sure we fix it in the next RC (I think the plan is to have it out this weekend)
Hi GiganticTurtle0
you should actually get " file://home/user/local_storage_path "
With "file://" prefix.
We always store the file:// prefix to note that this is a local path
This is exactly what I did here, and it is working 😞
https://demoapp.demo.clear.ml/projects/0e919ea1cc5c499b99e1ab85004b6e97/experiments/887edef09d4549e88b829a34c87d4d5b/output/execution
JitteryCoyote63 This seems like exactly what you are saying, elastic license issue...
for example train.py & eval.py under the same repo
Suppose that I have three models and these models can't be loaded simultaneously on GPU memory(
Oh!!!
For now, this is the behavior I observe: Suppose I have two models, A and B. ....
Correct
Yes this is a current limitation of the Triton backend BUT!
we are working on a new version that does Exactly what you mentioned (because it is such a common case where in some cases models are not being used that frequently)
The main caveat is the loading time, re-loading models from dist...
Hi @<1545216070686609408:profile|EnthusiasticCow4>
Many of the dataset we work with are generated by SQL query.
The main question in these scenarios is, are those DB stable.
By that I mean, generally speaking DB serve applications, and from time to time they undergo migration (i.e. change in schema, more/less data etc).
The most stable way is to create a script that runs the SQL query, and creates a clearml dateset from it (that script becomes part of the Dataset, to have full tracta...
The image is
allegroai/clearml:1.0.2-108
Yep, that makes sense, seems like a backwards compatibility issue
Hi @<1541954607595393024:profile|BattyCrocodile47>
see here: None
Try with app.clearml.mlops-club.org
and the rest of them
Hi @<1649221394904387584:profile|RattySparrow90>
: Are the models I defined to be served e.g. via the CLI downloaded to the serving pod
Yes this is done automatically and online (i.e. when you update the using CLI/API) , based on the models/endpoints you set
So that they are physically lying there as a file I can see in the filesystem?
They are, and cached there
Or is it more the case that the pod gets the model when needed/when an API call for this model is incoming?
I...
Hi JitteryCoyote63 a few implementation details on the services-mode, because I'm not certain I understand the issue.
The docker-agent (running in services mode) will pick a Task from the services queue, then it will setup the docker for it spin it and make sure the Task starts running inside the docker (once it is running inside the docker you will see the service Task registered as additional node in the system, until the Task ends) once that happens the trains-agent will try to fetch the...
I see.
You can get the offline folder programmatically then copy the folder content (it's the same as the zip, and you can also pass a folder instead of zip to the import function)task.get_offline_mode_folder()
You can also have a soft link of the offline folder (if you are working on a linux machine:ln -s myoffline_folder ~/.trains/cache/offline
LudicrousParrot69
I "think" I have a better handle on what you wish to do.
Is it kind of generic "serving" solution?
FYI:
Model artifact is, usually, a weights/model file. The idea that later you will be able to access it and serve it. Now the problem is (and I think this is what you are referring to) there is usually a specific piece of code tied to that model that can use it (a.k.a pyfunc)
A few ideas:
These days everyone is trying to build their models with generic interface, so that scik...
SubstantialElk6 Ohh okay I see.
Let's start with background on how the agent works:
When the agent pulls a job (Task), it will clone the code based on the git credentials available on the host itself, or based on the git_user/git_pass configured in ~/clearml.conf
https://github.com/allegroai/clearml-agent/blob/77d6ff6630e97ec9a322e6d265cd874d0ab00c87/docs/clearml.conf#L18
The agent can work in two modes:
Virtual environment mode, where it will create a new venv for each experiment ba...