
Reputation
Badges 1
25 × Eureka!if I want to run the experiment the first time without creating theΒ
template
?
You mean without manually executing it once ?
I'm not familiar with this one, I think you should be able to control it with:
None
CLEARML_AGENT__API__HTTP__RETRIES__BACKOFF_FACTOR
SmallDeer34 I have to admit this reference is relatively old, maybe we should update to auther http://clearml.ml (would that make sense ?)
Hi @<1578193378640662528:profile|MoodySeaurchin4>
but is it possible to log some metrics too, like rmse or the likes? If so, how would you do it?
Sure, I'm assuming this is part of the output ? if not, this means this is part of your code, and if this is the case then yes you should use collect_custom_statistics_fn
None
`collect_custom_statistics_fn({'rmse'...
Hi ItchyJellyfish73
The behavior should not have changed.
"force_repo_requirements_txt" was always a "catch all option" to set a behavior for an agent, but should generally be avoided
That said, I think there was an issue with v1.0 (cleaml-server) where when you cleared the "Installed Packages" it did not actually cleared it, but set it to empty.
It sounds like the issue you are describing.
Could you upgrade the clearml-server
and test?
So clearml server already contains an authentication layer (JWT Token), and you do have a full user management on top:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#web-login-authentication
Basically what I'm saying if you add httpS on top of the communication, and only open the 3 ports, you should be good to go. Now if you really need SSO (AD included) for user login etc, unfortunately this is not part of the open source, but I know they have it in the scale/ent...
Hi BattyLion34
I might have a solution, in order to make sure the two agents are not sharing the "temp" folder:
create two copies of ~/clearml.conf , let's call them :
~/clearml_service.conf ~/clearml_agent.confThen in each one select a different venvs_dir
see here:
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L90
for example:
~/.clearml/venvs-builds1 ~/.clearml/venvs-builds2Now start the two agents with:
The service age...
Sounds great! let me know what you find out π
Hi @<1523701066867150848:profile|JitteryCoyote63>
Hi, how does
agent.enable_git_ask_pass
works
basically it pushes the pass through stdin to git when it asks (it is a git feature)
FYI: These days TB became the standard even for pytorch (being a stand alone package), you can actually import it from torch.
There is an example here:
https://github.com/allegroai/trains/blob/master/examples/frameworks/pytorch/pytorch_tensorboard.py
HealthyStarfish45 did you manage to solve the report_image issue ?
BTW: you also have
https://github.com/allegroai/trains/blob/master/examples/reporting/html_reporting.py
https://github.com/allegroai/trains/blob/master/examples/reporting/...
So it sounds as if for some reason calling Task.init inide a notebook on your jupyterhub is not detecting the notebook.
Is there anything special about the jupyterhub deployment ? how is it deployed ? is it password protected ? is this reproducible ?
Queues can have multiple workers, and that implies multiple instances of a task can run concurrently.
@<1533619716533260288:profile|SmallPigeon24> as long as these are the Exact same instances you can have them runing simultaneously (think multi node training), that said each one should "know" not to report over the others, because of course it will overwrite the reports.
Back to your point on multiple agents:
You cannot have two Tasks in the same queue, that means that a single agen...
Actually it would be interesting to combine the two, feast is fully open-source and supported by the linux foundation, so I cannot see the harm in that.
wdyt?
Hmm that makes sense to me, any chance you can open a github issue so we do not forget ? (I do not think it should be very complicated to fix)
Hi @<1697056701116583936:profile|JealousArcticwolf24>
You have clearml Datasets None
It will version catalog and store meta-data of your datasets.
Each version only stores the delta from the parent version, but delta is on a file granularity not a "block" granularity
Notice that under the hood of course it uses storage solutions to store and cache the underlying immutable copy of the data. What's your use case?
Hi JitteryCoyote63
The NVIDIA_VISIBLE_DEVICES
is set automatically for the process the trains-agent spins, so from your code, it is transparent, you can only "see" GPU 0.
(Obviously not using docker you can forcefully change the OS environment in runtime, but you should avoid that ;))
Hi, I would like to understand how I can set the pip cache location for my agent,
ClumsyElephant70 by default the pip cache (and all other cache folders) are mounted back into the host itself ~/.clearml/
I'm assuming the idea is shared cache, if this is the case, do:docker_pip_cache = ~/my_shared_nfs/pip-cache
https://github.com/allegroai/clearml-agent/blob/e3e6a1dda81bee2dd20a64d09746568e415f1823/docs/clearml.conf#L139
Hi @<1549202366266347520:profile|GorgeousMonkey78>
how do I integrate sagemaker with clearml ,
you mean to launch an experiment, or just to log it?
Thank you @<1689446563463565312:profile|SmallTurkey79> !!!
Hi MelancholyBeetle72 , that's a very interesting case. I can totally understand how storing a model and then immediately renaming it breaks the upload. A few questions, is there a way for pytorch lightning not to rename the model? Also I wonder if this scenario happens a lot (storing model and changing it) . I think the best solution is for Trains to create a copy of the file and upload it in the background. That said the name will still end with .part What do you think?
Hi @<1639799308809146368:profile|TritePigeon86>
Sounds awesome, how can we help?
Hi @<1541954607595393024:profile|BattyCrocodile47>
It seems to me that instead of implementing webhooks to react to things like adding a tag to a model
Did you look at this example ?
None
Can we straightforwardly stream ALL ClearML events to another system?
what would you consider an event?
The "basic" object type is Task, a state in task is changed via an api call, would that be an e...
Hi StaleButterfly40
but if I sync more than once I get a duplication of each line in log
Hmm.. let me check if we can "force" overwriting (it might require you to have a more stateful code for the sync process)
sometime we resume training
How would that work in offline mode? The offline process cannot sync with the backend... Are you saying you would like to get a new capability, "continue-offline-session" ?
Hi @<1523701111020589056:profile|DefiantSpider5>
So there are two answers here, I'll start with the open-source version of both
Is there a way in clear ml to interactively view subsets of images based on a lasso of embedding plots
The ClearML Datasets have no "query" capabilities of the data inside the dataset. That means you can see preview images, statistics and download the datasets, but no query capabilities. On the other hand, there is no limitation on the type and format of me...
Hi @<1523701323046850560:profile|OutrageousSheep60>
What do you mean by "in clearml server" ? I do not see any reason a subprocess call from a Task will be an issue. What am I missing ?