Reputation
Badges 1
25 × Eureka!Number of entries in the dataset cache can be controlled via cleaml.conf : sdk.storage.cache.default_cache_manager_size
For future readers, see discussion here:
https://clearml.slack.com/archives/CTK20V944/p1629840257158900?thread_ts=1629091260.446400&cid=CTK20V944
--docker or in clearml.conf https://github.com/allegroai/clearml-agent/blob/21c4857795e6392a848b296ceb5480aca5f98e4b/docs/clearml.conf#L153
Okay I found it, this is due to the fact the newer versions are sending the events/images in a subprocess (it used to be a thread).
The creation of the object is done on he main process, updating file index (round robin manner), but the check itself, happens on the subprocess., which is not "aware" of the used indexes (i.e. it is always 0, hence when exceeding the history side, it skips it)
MuddySquid7 the fix was pushed to GitHub, you can now install directly from the repo:pip install git+
Is there any way to get just one dataset folder of a Dataset? e.g. only "train" or only "dev"?
They are usually stored in the same "zip" so basically you have to download both folders anyhow, but I guess if this saves space we could add this functionality, wdyt?
Task.init should be called before pytorch distribution is called, then on each instance you need to call Task.current_task() to get the instance (and make sure the logs are tracked).
What would be the best way to get all the models trained using a certain Task, I know we can use query_models to filter models based on Project and Task, but is it the best way?
On the Task object itself you have all the models.Task.get_task(task_id='aabb').models['output']
JitteryCoyote63 good news
not trains-server error, but trains validation error, this is easily fixed and deployed
We are planning an RC later this week, I'll make sure this fix is part of it
Please do, just so it wont be forgotten (it won't but for the sake of transparency )
Could it be you have old OS environment overriding the configuration file ?
Can you change the IP of the server in the conf file, and make sure it has an effect (i.e. the error changed)?
Thanks GrievingTurkey78
Sure just PR (should work with any Python/Hydra version):kwargs['config']=config kwargs['task_function']=partial(PatchHydra._patched_task_function, task_function,) result = PatchHydra._original_run_job(*args, **kwargs)
Ohh sorry you will also need to fix the
def _patched_task_function
The parameter order is important as the partial call relies on it.
My bad no need for that 🙂
The reasoning is that most likely simultaneous processes will fail on GPU due to memory limit
BTW: I think we had a better example, I'll try to look for one
Check the examples on the github page, I think this is what you are looking for 🙂
https://github.com/allegroai/trains-agent#running-the-trains-agent
Hi @<1523702307240284160:profile|TeenyBeetle18>
and url of the model refers to local file, no to the remote storage.
Do you mean that in the Model tab when you look into the model details the URL points to a local location (e.g. file:///mnt/something/model) ?
And your goal is to get a copy of that model (file) from your code, is that correct ?
Wait, is "SSH_AUTH_SOCK" defined on the host? it should auto mount the SSH folder as well?!
Yes the clearml import Must be outside if everything (so it can link with hydra), when you do it this way, by the time you import clearml, hydra is already done
Hi @<1552101458927685632:profile|FreshGoldfish34>
self-hosted, you mean the open source ? if so, then yes totally free 🙂
That said I would recommend to have the server inside your VPN, just in case from a security perspective
Clearml automatically gets these reported metrics from TB, since you mentioned see the scalars , I assume huggingface reports to TB. Could you verify? Is there a quick code sample to reproduce?
Hi SmallDeer34
Can you see it in TB ? and if so where ?
Thanks SmallDeer34 !
This is exactly what I needed