data:image/s3,"s3://crabby-images/ea8fc/ea8fc4a242d3fbf9f124d8906a48b69b89ea53a2" alt="Profile picture"
Reputation
Badges 1
25 × Eureka!Ohh, hmm, that is odd, there should not be a limit there. Let me check ....
okay that makes sense, if this is the case I would just use clearml-agent execute --id <task_id here>
to continue the training Task.
Do notice you have to reload your last chekcpoint from the Task's models/artifacts to continue 🙂
Last question, what is the HPO optimization algorithm, is it just grid/random search or optuna hbop/optuna, if this is the later, how do make it "continue" ?
Hi EnviousStarfish54
After the pop up do you see the plot on the web UI?
Working on it as we speak 🙂 probably a day worst case 2. This is quite strange and we are not sure where is the fault, as nothing in the code itself changed...
Hi @<1523701079223570432:profile|ReassuredOwl55> let me try ti add some color here:
Basically we have to parts (1) pipeline logic, i.e. the code that drives the DAG, (2) pipeline components, e.g. model verification
The pipeline logic (1) i.e. the code that creates the dag, the tasks and enqueues them, will be running in the git actions context. i.e. this is the automation code. The pipeline components themselves (2) e.g. model verification training etc. are running using the clearml agents...
And still a difference between A/B , one detecting the repo the other does not?
Lol yeah Hydra is great. Notice you still have the ability to override Hydra from the UI so you really have the best of the two worlds
Yes I think the writer.add_figure
somehow crops the image
EnviousStarfish54 you can use Use Task.set_credentials
Notice that OS environment or trains.conf will override the programmatic credentials
https://allegro.ai/docs/task.html#trains.task.Task.set_credentials
Hi VexedKangaroo32 , there is now an RC with a fix:pip install trains==0.13.4rc0
Let me know if it solved the problem
Hi FancyWhale93 you can disable the auto model uploading with@PipelineDecorator.component(..., auto_connect_frameworks={'pytorch': False}) def step(): pass
Hi FierceFly22
Hi, does anyone know where trains stores tensorboard data
Tesnorboard data is stored wherever you point your file-writer to 🙂
What trains is doing is while tensorboard writes it's own data to disk, it takes the data (in-flight) and sends it to the trains-server. The trains-server puts everything in the DB, so later everything is viewable & searchable.
Basically you don't need to store your TB files after your experiment is done, you have all the data in the trains-s...
Is this caused by running the script with the arguments
Yep 🙂
Working on it as we speak 🙂 Hopefully in the next release (probably next week)
Now will these 10 experiments be of different names? How will I know these are part of the 'mnist1' HPO case?
Yes (they will have the specific HP name/value combination).
FYI names are not unique so in theory you could have multiple experiments with the same name.
If you look under the Configuration Tab, you will find all the configuration arguments for the experiment. You can also add specific arguments to the experiment table (click the cogwheel at the right top corner, and select...
RobustSnake79 I have not tested, but I suspect that currently all the reports will stay in TB and not passed automagically into ClearML
It seems like something you would actually want to do with TB (i.e. drill into the graphs etc.) no?
NastyOtter17 can you provide some more info ?
Hi SourSwallow36
What do you man by Log each experiment separately ? How would you differentiate between them?
TenseOstrich47 as long as on the machine running the agent has credentials to your ECR, when the agent will run Any docker container, it will able to pull it. There is no need to manually change anything, notice the Task itself contains the name of the image it will use
GrievingTurkey78
Both are now supported, they basically act the same way 🙂
and log overrides + the final omegaconf
MinuteGiraffe30 if you are running the following command while your current directory is where you code is, what are you getting?
$ git ls-remote --get-url origin
This doesn't seem to be running inside a container...
What's the clearml-agent launch command you are using ? (i.e. do you have --docker flag)
Yes I think the difference is running conda install with arguments vs conda install with env file...