Reputation
Badges 1
25 × Eureka!Hi PricklyJellyfish35
My apologies this thread was forgotten π
What's the current status with the OmegaConf, ? (I'm not sure I understand what do mean by resolve=False)
Which means you currently save the argument after resolving and I'm looking to save them explicitly so the user will not forget to change some dependencies.
That is correct
I'm looking to save them explicitly so the user will not forget to change some dependencies.
Hmm interesting point. What's the use case for storing the values before the resolving ?
Do we want to store both ?
The main reason for storing the post resolve values, is that you have full visibility to the actual...
Any chance you actually run the second script with Popen (i.e. calling the python as a subprocess) ?
Hi UnsightlySeagull42
Just making sure, the two scripts are on your git repo ?
Hi CleanPigeon16
I was wondering how (or if) you handle interruptions.
Good question, basically (and I might be missing a few details but I think that's the general gist).
A new instance will be spinned (spot/regular based on your "compute budget") as long as there is a job in the "monitored" queue. that mean that if a worker was kicked by amazon (i.e. is spot) another one will be spinned instead as long as there is a job in the queue. That means that what is probably missing in you...
Are there any services OOB like this?
On the open-source, I can't recall any but will probably be easy to write. Paid tier might have an offering though, not sure π
We're wondering how many on-premise machines we'd like to deprecate.
I think you can see that in the queues tab, no?
Hi GrievingTurkey78 yes, /opt/clearml should contain everything.
That said, backup only after you spin down the DBs so they serialize everything,
Hi LudicrousDeer3
I have to admit I cannot remember one in the wild (I might be wrong though).
What's the specific use case you had in mind ?
Hi SubstantialElk6
but in terms of data provenance, its not clear how i can associate the data versions with the processes that created it.
I think DeliciousBluewhale87 βs approach is what we are aiming for, but with code.
So using clearml-data
from CLI is basically storing/versioning of files (with differentiable based storage etc, but still).
What ou are after (I think) is in your preprocessing code using the programtic Dataset class, to create the Dataset from code, this a...
so all models are part of the same experiment and has the experiment name in their name.
Oh that explains it, (1) you can use the model filename to control the model name in clearml (2) you can disable the autologging and manually upload the model, then you can control the model name
wdyt?
I guess i need to do something like the following after the task was created:
...
Yes!
Why use the "post" callback and not the "pre" callback?
The post get's back the Model object. The pre allows you to decide if you actually want to log in the first place (come to think about it, maybe you want that as well π )
post_optional_packages: ["google-cloud-storage", ]
Will install it last (i.e. after all the other packages) but only if you have it in the "Installed packages" list
The problem is that I currently don't have a way to get them "from outside".
Maybe as a hack (until we add the model object)
` class MyModelCB:
current_args = dict()
@classmethod
def callback(load_save, model_info):
if load_save != "save":
return model_info
model_info.name = "my new name" + str(current_args) # make a name from args
return model_info
WeightsFileHandler.add_pre_callback(MyModelCB.callback)
MyModelCB.current_args = {"args": "value"} `wdyt?
Hmm apparently it is not passed, but it could be.
Would the object itslef be enough to get the values? wouldn't it make sense to get them from outside somehow? (I'm assuming there is one set of args used at any certain moment?)
okay, let me check it, but I suspect the issue is running over SSH, to overcome these issues with pycharm we have specific plugin to pass the git info to the remote machine. Let me check what we can do here.
FiercePenguin76 BTW, you can do the following to add / update packages on the remote sessionclearml-session --packages "newpackge>x.y" "jupyterlab>6"
DistressedGoat23
you can now access the weights model objectpip install 1.8.1rc0
then:
` def callback(_, model_info):
model_info.weights_object # this is your xgboost object
model_info.name = "my new name"
return model_info
WeightsFileHandler.add_pre_callback(callback) `
Is that normal or a possible bug?
This sounds like xgboost internal format, it makes sense to me to be joblib (which is like pickle only faster and safer)
Let me see if we can also add the model object to the callback...
SmarmyDolphin68 sadly if this was not executed with trains (i.e. the offline option of trains), this is not really doable (I mean it is, if you write some code and parse the TB π but let's assume this is way to much work)
A few options:
On the next run, use clearml OFFLINE option, (i.e. in your code call Task.set_offline() , or set env variable CLEARML_OFFLINE_MODE=1) You can compress the upload the checkpoint folder manually, by passing the checkpoint folder, see https://github.com...
TroubledHedgehog16 generally speaking you can expect about 10 api calls per minute if you have many reports, and about 3 per minute on low report. We just optimized the sdk so in cases there are lots of consequential reports they are better batched, I would recommend the latest RC
Is this information stored anywhere or do I need to explicitly log this data somehow?
On the creating Task along side all the other reports.
Basically each model stores its creating Task (Task ID), using the Task ID you can query all the metrics reported by the task
this sounds like docker build issue on macos M1
https://pythonspeed.com/articles/docker-build-problems-mac/
Hi TenseOstrich47
You can check the new clearml-serving
, and the new python interfaces added to the "Model" class.
https://github.com/allegroai/clearml/blob/22d795f68f0175ba9511cabd444ea4dba464f3cd/clearml/model.py#L444
Ohh, hmm, that is odd, there should not be a limit there. Let me check ....
Hmmm, are you running inside pycharm, or similar ?
Hi NastyFox63
What do you mean not all of them are shown?
Do they have diff series/titles, are they plots or scalars ? How are you reporting them ?
yes, i see no more than 114 plots in the list on the left side in full screen modeβjust checked and the behavior exists on safari and chrome
Let me check with front-end guys π
NastyFox63 ask SuccessfulKoala55 tomorrow, I think there is a way to change the default settings even with the current version.
(I.e. increase the default 100 entries limit)