Reputation
Badges 1
25 × Eureka!Hi @<1610083503607648256:profile|DiminutiveToad80>
<h1>Request Entity Too Large</h1>
What's the size of the file? how are you running your clearml-server?
Clearml automatically gets these reported metrics from TB, since you mentioned see the scalars , I assume huggingface reports to TB. Could you verify? Is there a quick code sample to reproduce?
Thanks SmallDeer34 , I think you are correct, the 'output' model is returned properly, but "input" are returned as model name not model object.
Let me check something
I'll make sure we add the reference somewhere on GitHub
Hi SmallDeer34
Can you see it in TB ? and if so where ?
save off the "best" model instead of the last
Should be relatively easy to update on the main Task the model with the best performance, no?
SmarmySeaurchin8
When running in "dev" mode (i.e. writing the code) only packages imported directly are registered under "installed packages" , then when the agent is executing the experiment, it will update back the entire environment (including derivative packages etc.)
That said you can set detect_with_pip_freeze
to true (in trains.conf) and it will basically store the entire pip freeze.
https://github.com/allegroai/trains/blob/f8ba0495fb3af1f99732fdffbbccd2fa992934a4/docs/trains.c...
BTW,Β
Β has this at the bottom:
Yes, it is the company legal entity name. But I think that for refrencing it makes more sense to mention the product name ClearML
I think this looks good π
SmallDeer34 in theory no reason it will not work with it.
If you are doing a single node (from Ray's perspective)
This should just work, the challenge might be multi-node ray+cleaml (as you will have to use clearml to set the environment and ray as messaging layer (think openmpi etc.)
What did you have in mind?
Thanks SmallDeer34 !
Would you like us to? How about a footnote/acknowledgement?
How about a reference / footnote ?@misc{clearml, title = {ClearML - Your entire MLOps stack in one open-source tool}, year = {2019}, note = {Software available from
}, url={
}, author = {allegro.ai}, }
the parent task ids is what I originally wanted, remember?
ohh I missed it π
SmallDeer34 the function Task.get_models() incorrectly returned the input model "name" instead of the object itself. I'll make sure we push a fix.
I found a different solution (hardcoding the parent tasks by hand),
I have to wonder, how does that solve the issue ?
Hi JealousParrot68
spinning the clearml-agent with docker support (i.e. each experiment is running inside its own container):
https://clear.ml/docs/latest/docs/clearml_agent#docker-mode
Basically you can specify a default docker to use (per agent) and a specific docker container to use per Task (configured in the UI under execution at the bottom)
So on the ec2 instance (with the agent running), just install prior to running the agent:apt-get install poppler-utils
SubstantialElk6
The CA is taken automatically by urllib, check the OS environments you need to configure it.
https://stackoverflow.com/questions/27835619/urllib-and-ssl-certificate-verify-failed-errorSSL_CERT_FILE REQUESTS_CA_BUNDLE
Hi VexedCat68
Are we talking youtubes ? docs? courses ?
Yep it should :)
I assume you add the previous iteration somewhere else, and this is the cause for the issue?
Depending on your security restrictions, but generally yes.
maybe I should use explicit reporting instead of Tensorboard
It will do just the same π
there is no method for settingΒ
last iteration
, which is used for reporting when continuing the same task. maybe I could somehow change this value for the task?
Let me double check that...
overwriting this value is not ideal though, because for :monitor:gpu and :monitor:machine ...
That is a very good point
but for the metrics, I explicitly pass th...
Hi @<1523701868901961728:profile|ReassuredTiger98>
is there something like a clearml context manager to disable automatic logging?
Sure just do a wildcard with the files you actually want to autolog the rest will be ignored:
None
task = Task.init(..., auto_connect_frameworks={'pytorch' : '*.pt'}
sdk.conf will add it to the default loaded values (as I think you deduced).
can copy paste the sdk.conf here? (maybe something is missing there?)
Under your profile you should be able to see it
at that point we define a queue and the agents will take care of trainingΒ
This is my preferred way as well :)
Sounds good to me π