Okay, so basically set a template for the pod, specifying the docker image. Make sure you pass the correct trains-server configuration (i.e. api/web/file server addresses and credentials), and select the queue name the agent will listen to.
container image / details
https://hub.docker.com/r/allegroai/trains-agent
https://github.com/allegroai/trains-agent/tree/master/docker/agent
Full environment variable list to pass can be found here:
https://github.com/allegroai/trains-server/blob/...
WickedGoat98
Put the agent.docker_preprocess_bash_script
in the root of the file (i.e. you can just add the entire thing at the top of the trains.conf)
Might it be possible that I can place a trains.conf in the mapped local folder containing the filesystem and mongodb data etc e.g.
I'm assuming you are referring to the trains-=agent services, if this is the case, sure you can,
Edit your docker-compose.yml, under line https://github.com/allegroai/trains-server/blob/b93591ec3226...
Any chance @<1578918150261444608:profile|RoundJellyfish71> you can open a GitHub issue so that we can track it? (I think this is indeed a good idea)
Hi @<1547028031053238272:profile|MassiveGoldfish6>
The issue I am running into is that this command does not give me the dataset version number that shows up in the UI.
Oh no, I think you are correct, it will not return the version per dataset 😞 (I will make sure we add it)
But with the dataset ID you can grab all the properties:Dataset.get(dataset_id="aabbcc").version
wdyt
I have to assume that I do not know the dataset ID
Sorry I mean:
datasets = Dataset.list_datasets(dataset_project="some_project")
for d in datasets:
d["version"] = Dataset.get(dataset_id=d["id"]).version
wdyt?
E.g. I might need to have different N-numbers for the local and remote (ClearML) storage.
Hmm yes, that makes sense
That'd be a great solution, thanks! I'll create a PR shortly
Thank you! 🙏 🤩
But this is clearml python package, it is not really related to the server. Could it be you also update the clearml package ?
GiganticTurtle0 notice that when you spin an agent with --services-mode, you basically let it run many Tasks at once (this is in contrast to the default behavior, when you have one Task per agent).
Great ascii tree 🙂
GrittyKangaroo27 assuming you are doing:@PipelineDecorator.component(..., repo='.') def my_component(): ...
The function my_component
will be running in the repository root, so in thoery it could access the packages 1/2
(I'm assuming here directory "project" is the repository root)
Does that make sense ?
BTW: when you pass repo='.'
to @PipelineDecorator.component
it takes the current repository that exists on the local machine running the pipel...
ElegantCoyote26 what is the model input layer definition? This implies the data format to pass to the serve endpoint
Ok the doc needs fix (edited)
suggestion?
suspect permissions, but not entirely sure what and where
Seems like it.
Check the config file on the agent machine
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L18
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L19
Hi LivelyLion31
Yes, the reason we designed Trains with an automagic integration is exactly that reason, so users do not need to learn another package and that with almost no effort you get most of the benefits.
Regrading the TB files, from experience most users will use the TB files short after they executed the experiment, usually for debugging and in depth capabilities (like network debugger profile etc), metric view is something that is much easier to do on a centralized server (like on...
And it works correctly when running on my computer, and if I use colab, then for some reason it has no effect.
I think I'm lost on this one, when running in colab, is this continuing a previous experiment ?
Hi SteadyFox10
Short answer no 😞
Long answer, full permissions are available in the paid tier, along side a few more advanced features.
Fortunately in this specific use case, the community service allows you to share a single (or multiple) experiments with a read-only link. Would that work ?
ShinyLobster84
fatal: could not read Username for '
': terminal prompts disabled
This is the main issue, it needs git credentials to clone the repo code, containing the pipeline logic (this is the exact same behaviour as pipeline v1 execute_remotely(), which is now the default, could it be that before you executed the pipeline logic, locally ?)
WackyRabbit7 could the local/remote pipeline logic could apply in your case as well ?
maybe worth updating the main Readme.md in the github.. if someone try to follow the instructions there it breaks
Hmm I thought we already did, Yes you are absolutely correct, I'll make sure we do
Should be fairly easy to add no?
(I'll make sure it is added to the docstring because apparently it was not there
Good question 🙂from clearml import Task Task.init('examples', 'test')
FranticCormorant35
See here https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py#L42
So if any step corresponding to 'inference_orchestrator_1' fails, then 'inference_orchestrator_2' keeps running.
GiganticTurtle0 I'm not sure it makes sense to halt the entire pipeline if one step fails.
That said, how about using the post_execution callback, then check if the step failed, you could stop the entire pipeline (and any running steps), what do you think?
So in theory you can clone yourself 2 extra times and push into an execution queue, but the issue might be actually making sure the resources are available. what did you have in mind?
BTW,
has this at the bottom:
Yes, it is the company legal entity name. But I think that for refrencing it makes more sense to mention the product name ClearML
I think this looks good 🙂
At the moment I'm querying by paging through the tasks as you recommended, and then filtering with standard python list-comprehension filters...Which is less than ideal.
At least let's do that better:
Use Task._query_tasks:Task._query_tasks(order_by=['-started'], page_size=10, page=0, only_fields=['id', 'started'])
You will get "lighter" objects returned, then you can filter them with code (but the request will be a lots faster)
SuccessfulKoala55 any suggestion on improving that ?
BTW: see if this works:$ CLEARML_API_HOST_VERIFY_CERT=0 clearml-init
JitteryCoyote63 The release was delayed due a last minute issue, should be released later today. Anyhow the code is updated on GitHub, so you can start implementing :) let me know if I can be of help :)
I'm kind of at a point where I don't know a lot of what to even search for.
we feel you 💗 , yes there still isn't a very good source of information on where to get started...
This is because the entire field is constantly changing and evolving, and one solution will usually only apply to specific use case...
I would start with the mlops community slack channel, and youtube talks (specifically those by companies describe how they built their own internal infrastructure, i...