Reputation
Badges 1
25 × Eureka!What's the difference between the example pipeeline and this code ?
Could it be the "parents" argument ? what is it?
Hi LazyLeopard18 ,
See details below, are you using the win10 docker-compose yaml?
https://github.com/allegroai/trains-server/blob/master/docs/install_win.md
Meanwhile check CreateFromFunction(object).create_task_from_function(...)
It might be better suited than execute remotely for your specific workflow 🙂
Hi @<1546303293918023680:profile|MiniatureRobin9>
Im not sure to understand the difference between a worker and an agent.
hmm we should probably make that clearer 🙂
agent = the clearml-agent instance running on the machine
worker is the system term representing the instance of the agent
You can have one machine with multiple agents (i.e. multiple workers) running on it.
Does that make sense ?
let me check
Epochs are still round numbers ...
Multiply by 2?! 😅
OddAlligator72 let's separate the two issues:
Continue reporting from a previous iteration Retrieving a previously stored checkpointNow for the details:
Are you referring to a scenario where you execute your code manually (i.e. without the trains-agent) ?
PompousBeetle71 let me know if it solves your problem
Thanks PompousBeetle71
Quick question, what frameworks are you using?
Do you use save
method directly to file stream (or any other direct storage)?
Both are fully implemented in the enterprise version. I remember a few medical use cases, and I think they are working on publishing a blog post on it, not sure. Anyhow I suggest you contact the sales people and I'm sure they will gladly setup a call/demo/PoC.
https://allegro.ai/enterprise/#contact
MinuteGiraffe30 if you are running the following command while your current directory is where you code is, what are you getting?
$ git ls-remote --get-url origin
Ho @<1739818374189289472:profile|SourSpider22>
What are you trying to install, just the agent? if so pip install clearml-agent
is all you need
You are correct, the agent will clone the git and install the requirements, as written in the task installed packages section. Regrading the git branch, notice it will pull the specific commit id as stated in the execution section, and it will apply any uncommitted changes. You can edit the execution section and change the commit to the latest in a specific version (you should probably also clear the uncommitted changes of you do that)
Thanks CharmingShrimp37 !
Could you PR the fix ?
It will be just in time for the 0.16 release 🙂
and do you have import tensorflow in your code?
This is strange... Could you send the browser console log, maybe there is an exception there
If the problem consists (i.e. trains failing to detect packages, please open a GitHub Issue so the bug will not get lost 🙂
WackyRabbit7
Cool - so that means the fileserver which comes with the host will stay emtpy? Or is there anything else being stored there?
Debug Images and artifacts will be automatically stored to the file server.
If you want your models to be automagically uploaded add the following :task=Task.init('example', 'experiment', output_uri='
')
(You can obviously point it to any other http/S3/GS/Azure storage)
The versions don't need to match, any combination will work.
Yes, I mean trains-agent. Actually I am using 0.15.2rc0. But, I am using local files, I mean I clone trains and trains-agent repos and install them. Their versions are 0.15.2rc0
I see, that's why we get the git ref, not package version.
Specifically notice step (1) and (2) they are important for Windows docker service to be able to run the elastic container and mongo container
do you have git repo link in the execution section of the experiment ?
I would clone the first experiment, then in the cloned experiment, I would change the initial weights (assuming there is a parameter storing that) to point to the latest checkpoint, i.e. provide the full path/link. Then enqueue it for execution. The downside is that the iteration counter will start from 0 and not the previous run.
MysteriousBee56 when you execute your code once it will appear in the server (with all fields pre-populated based on your setup/git etc.) once it is there you can "clone" them and move them around.
Is this what you mean?
A bit of background, the idea behind Trains is that the environment definition (i.e,. git repo packages etc, code entry arguments etc.) is collected when executing the code. This avoids the tedious task of generating and maintaining YAML/Json configuration files.
What is exa...
In regards to the YAML how would you pass data? Like the pipeline from tasks example?
Hi @<1570220858075516928:profile|SlipperySheep79>
I think this is more complicated than one would expect. But as a rule of thumb, console logs and metrics are the main ones. I hope it helps? Maybe sort by number of iterations in the experiment table ?
BTW: probable better to ask in channel
BTW: UnevenDolphin73 you should never actually do "task = clearml.Task.get_task(clearml.config.get_remote_task_id())"
You should just do " Task.init()
" it will automatically take the "get_remote_task_id" and do all sorts of internal setups, you will end up with the same object but in an ordered fashion
Yes even without any arguments give to Task.init()
, it has everything from the server
My driver says "CUDA Version: 11.2" (I am not even sure this is correct, since I do not remember installing code in this machine, but idk) and there is no pytorch for 11.2, so maybe it fallbacks to cpu?
For some reason it detect CUDA 11.1 (I assume this is what you have installed, the driver CUDA version is the highest it will support not necessary what you have installed)