Reputation
Badges 1
25 × Eureka!i.e. runpip install --upgrade trains
We're wondering how many on-premise machines we'd like to deprecate.
I think you can see that in the queues tab, no?
the Task scheduler itself is a Task. What we did is we added a new parameter section on the Task (the task.connect call), so that we can later clone and modify it and use the new value in runtime
(Task.connect will put the data from the Task/UI back into the dict when the agent is running the Scheduler)
Does that make sense?
Yes that would work π
You can also put it in the docker compose see TRAINS_AGENT_DEFAULT_BASE_DOCKER
UnevenDolphin73 something like this one?
https://github.com/allegroai/clearml/pull/225
Many thanks! I'll pass on to technical writers π
In our case, we have a custom YAML instruction
!include
, i.e.
Hmm interesting, in theory this might work since configuration encoding (when passing dicts), is handled with HOCON which does support referencing.
That said currently it is not aware of "remote configurations" only ENV variables and local files.
It will be cool to add, do we have a github issue on that? (would you like to see if you can PR such a thing?)
Hi @<1526371965655322624:profile|NuttyCamel41>
so sorry I just realized I have not answered it it!
I just tried the pytorch example from the clearml-serving repo and got the error about the wrong model name
okay that is odd, are you using the exact same containers / docker-compose? what is the difference ?
I0603 09:44:02.665851 41 model_lifecycle.cc:693] successfully loaded 'test_model_pytorch' version 1
does that mean that even though there is a warning there you can curl to ...
any idea why i cannot selected text inside the table?
Ichh, seems again like plotly π I have to admit quite annoying to me as well ... I would vote here: None
Hmm maybe different numpy version? ( numpy==1.22.1 maybe the Task needs a diff version) ? Can you post the Task log ?
Just curious about the timeout, was it configured by clearML or the GCS? Can we customize the timeout?
I'm assuming this is GCS, at the end the actual upload is done GCS python package.
Maybe there is an env variable ... Let me google it
Why? The task should have completed successfully, how is this aborting?
Early stopping by the HPO process, like hyper-band, e.g. this training model is going nowhere let's stop it.
And you are seeing a bunch of the GS SSL errors?
SubstantialElk6
Hmm do you have torch in the "installed packages" section of the Task ?
(This what the agent is using to setup the environment inside the docker, running as a pod)
Could you test with the latest "cleaml"pip install git+Task.add_requirement(".") should be supported now π
SmarmySeaurchin8args=parse.parse() task = Task.init(project_name=args.project or None, task_name=args.task or None)You should probably look at the docstring π
:param str project_name: The name of the project in which the experiment will be created. If the project does
not exist, it is created. If project_name is None, the repository name is used. (Optional)
:param str task_name: The name of Task (experiment). If task_name is None, the Python experiment
...
Hi MinuteGiraffe30
Are you saying that when you are running you code locally with a gitea repository, cleamrl incorrectly adds a link to gitlab ?
StickyBlackbird93 the agent is supposed to solve for the correct version of pytorch based on the Cuda in the container. Sounds like for some reason it fails? Can you provide the log of the Task that failed? Are you running the agent in docker-mode , or inside a docker?
Yes, consider VexedCat68 txt file the Dataset "content" , this will enable ypu to safely get the list of files, and then you can use the StorageManager to download them extend this concept and have it built into the Dataset itself, i.e. allow you to add files as links and make sure it will just download them. The caveat here is that the Dataset at the end, returns a folder with the files, when you specify links, you have to also specify the target location locally (at the end you want a fol...
Hi NonsensicalSeaanemone47
I'm assuming you mean k8s as compute cluster?
If so, then yes clearml adds priority scheduling on top of your existing kl8s cluster. It also allows you to reuse images as the k8s spins the base container image and then inside the container image the agent sets the environment of the experiment (clones code, apply diff, install missing python packages etc.)
It also gives visibility into the executed pods.
Make sense ?
So this is an additional config file with enterprise?
Extension to the "clearml.conf" capabilities
Is this new config file deployable via helm charts?
Yes, you can also set it company/user wide using the clearml Vault feature (again enterprise, sorry π )
MysteriousBee56 when you execute your code once it will appear in the server (with all fields pre-populated based on your setup/git etc.) once it is there you can "clone" them and move them around.
Is this what you mean?
A bit of background, the idea behind Trains is that the environment definition (i.e,. git repo packages etc, code entry arguments etc.) is collected when executing the code. This avoids the tedious task of generating and maintaining YAML/Json configuration files.
What is exa...
DM me the entire log, I would assume this is something with the configuration
Hi ThoughtfulBadger56
Just add --stop to the clearml-agent
(the exact same command as you used to spin it, just add --stop at the end and it will stop it, or just do clearml-agent daemon --stop and it will iteratively close them)