Reputation
Badges 1
25 × Eureka!ok, yes, but this will install the package of the branch specified there.
Correct
So If im working on my own branch and want to run an experiment, I would have to manually put in the git path my current branch name.
When you say your own branch you mean local (i.e. not pushed to remote git repo) ?
Hi FriendlyKoala70 you can edit the installed package section and add the missing package. See more details on how trains-agent works here (although it's on conda the same rules apply for pip) https://github.com/allegroai/trains-agent/issues/8
Basically it hooks into any torch.save function (monkey patching in realtime)
Very Cool!
BTW guys, are you using the task.models[] to continue from the last checkpoint? or is it task.artifacts[] ?
Hey SarcasticSparrow10 see here π
https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_linux_mac.html#upgrading
I did change the
instead of 8080?
So this is the issue
Yes, as long as the client is served from http://app.something.com it will look for the api server at http://api.something.com
YummyWhale40 you mean like continue training?
https://github.com/allegroai/trains/issues/160
GiganticTurtle0 fix was just pushed to GitHub πpip install git+
Specifically for model files, if you set the Task.init(..., output_uri=True) it will automatically upload any saved model to the files server (you can also pointΒ to any object storage / shared folder)
What's the framework you are using ?
YummyWhale40 no idea what the pytorch-lighting guys did there. let me check a the actual code.
Hmm I think the approach in general would be to create two pipeline tasks, then launch them from a third pipeline or trigger externally? If on the other hand it makes sense to see both pipelines on the same execution graph, then the nested components makes a lot of sense. Wdyt?
That's not possible, right?
That's actually what the "start_locally" does, but the missing part is starting it on another machine without the agent (I mean it totally doable, and if important I can explain how, but this is probably not what you are after)
I really need to have a dummy experiment pre-made and have the agent clone the code, set up the env and run everything?
The agent caches everything, and actually can also just skip installing the env entirely. which would mean ...
ThickDove42 looking at the code, I suspect it fails interacting with the actual jupyter server (that is running on the same machine, but still).
Any chance you have a firewall on the Windows machine ?
Hi RipeGoose2
You can also report_table them? what do you think?
https://github.com/allegroai/clearml/blob/master/examples/reporting/pandas_reporting.py
https://github.com/allegroai/clearml/blob/9ff52a8699266fec1cca486b239efa5ff1f681bc/clearml/logger.py#L277
I mean to reduce the API calls without reducing the scalars that are logged, e.g. by sending less frequent batched updates.
Understood,
In my current trials I am using up the API calls very quickly though.
Why would that happen?
The logging is already batched (meaning 1API for a bunch of stuff)
Could it be lots of console lines?
BTW you can set the flush period to 30 sec, which would automatically collectt and batch API calls
https://github.com/allegroai/clearml/blob/25df5efe7...
Maybe that's the issue :
https://github.com/googleapis/python-storage/issues/74#issuecomment-602487082
The file is never touched, nowhere in the process that file is deleted.
it should never have gotten there, this is not the git repo folder, it one level above...
Hi MortifiedCrow63
I finally got GS credentials, there is something weird going on. I can verify the issue, with model upload I get timeout error while upload_artifacts just works.
Just updating here that we are looking into it.
This looks exactly like the timeout you are getting.
I'm just not sure what's the diff between the Model autoupload and the manual upload.
Yes, I think we just found out it breaks clearml π
could you test with the latest stable, just in case ?
(I'll make sure we have an RC that supports the hydra dev version)
Hi MortifiedCrow63
Sorry getting GS credentials is taking longer than expected π
Nonetheless it should not be an issue (model upload is essentially using the same StorageManager internally)
Hi AdventurousWalrus90
Thank you for the kind words! π
/home/usr_338436_ulta_com/.clearml/venvs-builds/3.7/.gitignore
so this is the error on the agent ?
Just curious about the timeout, was it configured by clearML or the GCS? Can we customize the timeout?
I'm assuming this is GCS, at the end the actual upload is done GCS python package.
Maybe there is an env variable ... Let me google it
Internally we use blob.upload_from_file it has a default 60sec timeout on the connection (I'm assuming the upload could take longer).
Glad to hear!
(yeah @<1603198134261911552:profile|ColossalReindeer77> I'm with you the override is not intuitive, I'll pass the info to the technical writers, hopefully they can find a way to make it easier to understand)