Reputation
Badges 1
25 × Eureka!It is way too much to pass on env variable š
ReassuredTiger98 I think it is using moviepy for the encoding... No?
In venv mode yes, in docker mode you can pass them by setting the -e flag on the docker_extra_flags
https://github.com/allegroai/trains-agent/blob/121dec2a62022ddcbb0478ded467a7260cb60195/docs/trains.conf#L98
BTW: what would be a reason to go back to self-hosted? (not sure about the SaaS cost, but I remember it was relatively cheap)
In the UI you can edit the base container image + add "SETUP SHELL SCRIPT", with any missing "apt update && apt-get install -y ..."
Hi DizzyPelican17
Iād like to configure requirements file, docker image, docker command for my pipeline controller, but it seems I cannot set it up. Am I missing something?The decorator itself accepts those as arguments:
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#pipelinedecoratorcomponent
https://github.com/allegroai/clearml/blob/90f30e8d9a5ca9a1afa6b2e5ffccb96b0afe9c78/examples/pipeline/pipeline_from_decorator.py#L8
Iād like to setup up...
Hi RotundHedgehog76
Notice that the "queued" is on the state of the Task, as well as the the tag
We tried to enqueue the stopped task at the particular queue and we added the particular tagWhat do you mean by specific queue ? this will trigger on any Queued Task with the 'particular-tag' ?
HI ResponsiveCamel97
What's the clearml-server version? How do you spin the server on your k8s cluster, helm ?
Another (minor) issue is that all the packages that are installed using git+https are cloned and installed twice, immediately one after the other
Yes this is so that we can better log the installed package name, not a major issue, but we just fixed a bug with derivative packages from git packages.
https://github.com/allegroai/trains/issues/196
You can set torch to be installed last:
post_packages: ["horovod", "torch"]
Which will make sure the "trains-agent" version (the one you specified in the "installed packages" will be installed last.
Hi LovelyHamster1
You mean when as a section name or a variable?
Could you change this example to include a variable that breaks the support ?
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
So what will you query ?
No worries, and I hope you manage to get that backup.
Hi @<1561885921379356672:profile|GorgeousPuppy74>
Please use threads to ask questions, so we keep everything tidy
(and if you can please remove your first message, and merge it with the above one, this one and edit this one, for better readability)
regrading the issue, you need to either have clearm.conf in your Home folder, I'm assuming thisis /root/
not /home/ubuntu/.
Also not sure why you need to expose ports...
Is it possible to make a checkbox in the profile settings. which would answer az the maximum limit for comparison?
This feature is becoming more and more relevant.
So we are working on a better UI for it, so that this is not limited (it's actually the UI that is the limit here)
specifically you can add custom columns to the experiment table (like accuracy loss etc), and sort based on those (multiple values are also supported, just hold the Shift-Key). This way you can quickly explore ...
for example train.py & eval.py under the same repo
Ohh if this is the case, you might also consider using offline mode, so there is no need for backend
https://clear.ml/docs/latest/docs/guides/set_offline#setting-task-to-offline-mode
GiganticTurtle0 can you please add a github issue with feature request to clearml-agent? I think this is a great use case!
Hi @<1559711593736966144:profile|SoggyCow20>
How did you configure the clerml.conf ? see here an example:
None
Hi @<1547028031053238272:profile|MassiveGoldfish6>
The issue I am running into is that this command does not give me the dataset version number that shows up in the UI.
Oh no, I think you are correct, it will not return the version per dataset š (I will make sure we add it)
But with the dataset ID you can grab all the properties:Dataset.get(dataset_id="aabbcc").version
wdyt
Could I use "register artifact"
I think this is somewhat deprecated and we should probably replace it with something similar to what you mentioned (i.e. watch a file change).
Right now the easiest way would e to manually upload the trainer_state.json every checkpoint:Task.current_task().upload_artifact('trainer_state.json, name='state') `
Hi PompousParrot44
So do you mean something like:
` task_model_a = Task.get('id_a')
task_model_b = Task.get('id_b')
model_a_file = task_model_a.models['output][-1].get_local_copy()
model_b_file = task_model_b.models['output][-1].get_local_copy() `
What probably happens is first torch is installed via "trains-agent", then it installs the other packages and they require a different version, so pip automatically replaces it.
Hi @<1533982060639686656:profile|AdorableSeaurchin58>
Notice the scalars and console are stored on the elasticsearch DB, this is usually under/opt/clearml/data/elastic_7
f I log 20 scalars every 2000 training steps and train for 1 million steps (which is not that big an experiment), that's already 10k API calls...
They are batched together, so at least in theory if this is fast you should not get to 10K so fast, But a Very good point
Oh nice! Is that for all logged values? How will that count against the API call budget?
Basically this is the "auto flush" it will flash (and batch) all the logs in 30sec period, and yes this is for all the logs (...
Hi @<1545216070686609408:profile|EnthusiasticCow4>
Many of the dataset we work with are generated by SQL query.
The main question in these scenarios is, are those DB stable.
By that I mean, generally speaking DB serve applications, and from time to time they undergo migration (i.e. change in schema, more/less data etc).
The most stable way is to create a script that runs the SQL query, and creates a clearml dateset from it (that script becomes part of the Dataset, to have full tracta...
Or use python:3.9 when starting the agent
This is probably the best solution š