
Reputation
Badges 1
25 × Eureka!correct on both.
notice that with upload
you can specify any storage (S3/GS/Azure atc)
WhimsicalLion91 I guess import/export is going to be more challenging, doable though. You will need to get all the Tasks, then collect all the artifacts, then collect all the reported logs (console/plots/etc). Then import everything back to your own server...
Exporting a single Tasktask.export_task
and Task.import_task
If you need all the scalars :task.get_reported_scalars(...)
And the console logs:Task.get_reported_console_output
HI PlainSquid19 could you add a bit more information? Are you running trains-agent ? is it in docker/venv mode ? what's the trains/trains-agent/trains-server versions ?
DepressedChimpanzee34 something along the lines of:from multiprocessing.pool import ThreadPool p = ThreadPool() def get_last_metric(t): return t.get_last_scalar_metrics() task_scalars_list = p.map(get_last_metric, top_tasks) p.close()
We parallelized network connection as I'm assuming the delay is fetching
Hmmm that is odd... based on the reply "'Task' object has no attribute 'hyperparams'", I would assume API version is lower then 2.9. But you specifically said you see Session.api_version == 2.9
is that correct?
Hi DisgustedDove53
When you say "deployment" there are a lot of way to interpret that 🙂 what exactly are you looking for ?
Hi MelancholyElk85
I have strong deja vu feeling. Credentials are OK. How to solve this? If you need the full log, how to share the full log without sharing private information? I'm fed up with this shit
Is this coming from the agent ?
JitteryCoyote63 I think that without specifically adding torch to the requirements, the agent will not be able to automatically resolve the correct cuda/torch version. Basically you should add torch to the requirements.txt file, and provide it to Task create, or use Task.force_requirements_env_freeze
Just making sure, the machine that you were running the "trains-init" on can access the API server ?
Hi @<1649221394904387584:profile|RattySparrow90>
: Are the models I defined to be served e.g. via the CLI downloaded to the serving pod
Yes this is done automatically and online (i.e. when you update the using CLI/API) , based on the models/endpoints you set
So that they are physically lying there as a file I can see in the filesystem?
They are, and cached there
Or is it more the case that the pod gets the model when needed/when an API call for this model is incoming?
I...
or can I directly open a PR?
Open a direct PR and link to this thread, I will make sure it is passed along 🙂
SweetGiraffe8
That might be it, could you test with the Demo server ?
Great! btw: final v1.2.0 should be out after the weekend
GloriousPenguin2 could you open a GitHub issue on it? Just making sure this will actually get fixed 🙂
Thanks DilapidatedDucks58 ! We ❤ suggestions for improvements 🙂
Did you try to print a page using the browser (I think that they can all store it as pdf these days) Yes I agree, it would 🙂 we have some thoughts on creating plugins for the system, I think this could be a good use-case. Wait a week or two ;)
im helping train my friend
on clearml to assist with his astrophysics research,
if that's the case, what you can do is use the agent inside your sbatch script,
(full open source). This means the sbatch becomes " clearml-agent execute --id <task_id_here>
" this will set up the environment and monitor the job and still allow you to launch it from slurm, wdyt?
Hi VexedCat68
So if I understand correctly, the issue is this argument:parameter_override={'Args/dataset_id': '${split_dataset.split_dataset_id}', 'Args/model_id': '${get_latest_model_id.clearml_model_id}'},
I think that what is missing is telling it this an artifact:parameter_override={'Args/dataset_id': '${split_dataset.artifacts.split_dataset_id.url}', 'Args/model_id': '${get_latest_model_id.clearml_model_id}'},
You can see the example here:
https://clear.ml/docs/latest/docs/ref...
and then?
The thing is programmatically this is not easy to do as API, because at the end the "function" (i.e. LCI) never leaves, it connects to the SSH and stays
But you can query the Task it creates, the project is known, the user is known and it is of special type/tag
Thanks LethalCentipede31 , i think (3) is the most stable solution (as it doesn't require to add another package, and should work on any python version / OS)
This is actually what we do for downloads .
DO you know if there is a minimum required python requests version ?
TenseOstrich47 every agent instance has its own venv copy. Obviously every new experiment will remove the old venv and create a new one. Make sense?
If you edit the requirements to have
https://download.pytorch.org/whl/cpu/torch-1.4.0%2Bcpu-cp37-cp37m-linux_x86_64.whl
Obviously if you click on them you will be able to compare based on specific metric / parameters (either as table or in parallel coordinates)
Thanks TroubledJellyfish71 I manged to locate the bug (and indeed it's the new aarach package support)
I'll make sure we push an RC in the next few days, until then as a workaround, you can put the full link (http) to the torch wheel
BTW: 1.11 is the first version to support aarch64, if you request a lower torch version, you will not encounter the bug
BTW: you can quite easily add an option to set the offline folder, check here:
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/trains/config/init.py#L31
PRs are always appreciated :)
CheerfulGorilla72 as I understand there were some delays wit the current release, so it is going to be out this week. The one after that includes this feature and as far as I understand would be mid Dec.