
Reputation
Badges 1
25 × Eureka!Any plans to add unpublished stateΒ for clearml-serving?
Hmm OddShrimp85 do you mean like flag, not being served ?
Should we use archive
?
The publish state, basically locks the Task/Model so they are not to be changed, should we enable unlocking (i.e. un-publish), wdyt?
GiganticTurtle0 what's the Dataset Task status?
DepressedChimpanzee34 <character> will almost always be converted into \ because otherwise it will not support \t or \n etc.
What I'm looking here is some logic that will allow us not to break backwards compatibility on the one hand, but still will allow you to have something like "first\second" entry.
WDYT? any ideas? (I really want to make sure we fix it as soon as possible)
(or woman or in between, we are supportive as long as code is working π )
I double checked with the guys this issue was fixed in 1.14 (of clearml server). It should be released tomorrow / weekend
If that's the case check the free space in the monitoring of the experiment, you will find the free space in GB logged
like this.. But when I am cloning the pipeline and changing the parameters, it is running on default parameters, given when pipeline was 1st run
Just making sure, you are running the cloned pipeline with an agent. correct?
What is the clearml version you are using?
Is this reproducible with the pipeline example ?
@<1523710674990010368:profile|GreasyPenguin14> If I understand correctly you can use tokens as user/pass (it's basically the same interface from the git client perspective, meaning from ClearML
git_user = gitlab-ci-token
git_pass = <the_actual_toke>
WDYT?
Good point!
I'll make sure we do π
@<1533620191232004096:profile|NuttyLobster9> I think we found the issue, when you are passing a direct link to the python venv, the agent fails to detect the python version and since the python version is required for fetching the correct torch it fails to install it. This is why passing CLEARML_AGENT_PACKAGE_PYTORCH_RESOLVE=none
because it skipped resolving the torch / cuda version (that requires parsing the python version)
I just tested the master with https://github.com/jkhenning/ignite/blob/fix_trains_checkpoint_n_saved/examples/contrib/mnist/mnist_with_trains_logger.py on the latest ignite master and Trains, it passed, but so did the previous commit...
The quickest workaround would be, In your final code just do something like:my_params_for_hpo = {'key': omegaconf.key} task.connect(my_params_for_hpo, name='hpo_params') call_training_with_value(my_params_for_hpo['key'])
This will initialize the my_params_for_hpo
with the values from OmegaConf, and allow you to override them in the hyperparameyter section (task.connect is two, in manual it stores the data on the Task, in agent mode, it takes the values from the Task and puts them ba...
Hi @<1541229818828296192:profile|HurtHedgehog47>
plots we create in the notebook are not saved as it was made.
I'm assuming these are matplotlib plots ?
Notice that ClearML tries to convert the plot into interactive plots, in that process sometimes, colors and legend is being lost (becomes generic).
You can however manually report the plot, and force it to store it as non-interactive:
task.logger.report_matplotlib_figure(
title="Manual Reporting", series="Just a plot", ite...
Hi ElegantCoyote26
what's the clearml version you are using?
Well it is there, do you have it in your docker-compose as well?
https://github.com/allegroai/trains-server/blob/master/docker-compose.yml#L55
Then in theory (since the backend is python based) you just need to find a base docker image to build it on.
Hi ExuberantBat52
I do not think you can... i would use aws secret manager to push the entire user list config file wdyt?
I think you can force it to be started, let me check (I pretty sure you can on aborted Task).
The agent ip? Generally whatβs the expected pattern to deploy and scale this for multiple models?
Yes the agent's IP, and with multiple agents, one would probably use k8s for the nodes, then configure ingest. This is the next step for the cleaml-serving, adding support for KFServing or manually configuring the ingest. wdyt?
So you mean 1.3.1 should fix this bug?
Yes it should see the release notes, there are a few "disappearing" UI fixes:
https://github.com/allegroai/clearml-server/releases/tag/v1.3.0
Hi MinuteWalrus85
This is great question, and super important when training models. This is why we designed a whole system to manage datasets (including storage querying, balancing data, and caching). Unfortunately this is only available in the paid tier of Allegro... You are welcome to https://allegro.ai/enterprise/ the sales guys.
π
Based on what I see when the ec2 instance starts it installs the latest, could it be this instance is still running?
Hi SmugLizard24
The question is what is the reason of the issue?
That is a good question, could it be out of memory? (trying to compress or send the file in one chunk?)
is number of calls performed, not what those calls were.
oh, yes this is just a measure of how many API calls are sent.
It does not really matter which ones
ShallowCat10 try something similar to this one, due notice that it might take a while to get all the task objects, so I would start with a single one π
`
from trains import Task
tasks = Task.get_tasks(project_name='my_project')
for task in tasks:
scalars = task.get_reported_scalars()
for x, y in zip(scalars['title']['original_series']['x'], scalars['title']['original_series']['y']):
task.get_logger().report_scalar(title='title', series='new_series', value=y, iteration=...
Really what I need is for A and B to be separate tasks, but guarantee they will be assigned to the same machine so that the clearml dataset cache on that machine will be warm.
I think that what you are looking for is multi-machine cache (which is fully supported). Basically mount an NFS/SMB folder from a NAS to any of those machines, configure the cache folder to point to it, and not you do not need to worry about affinity ?
no?
Is there a way to group A and B into a sub-pipeline, h...