Reputation
Badges 1
25 × Eureka!Is the clearml-agent queue not available in the open source?
fully available in the open source, what is missing is the SLURM connection, in the open source daemon is installed per machine (node) and spins containers/venv on the machine. The enterprise version adds support so it uses SLURM to provision the node. I hope it helps 🙂
so do you think it would be possible to spin up another daemon, which listens to this daemon, which then runs a slurm job?
This is exactly what the ...
Hi @<1610083503607648256:profile|DiminutiveToad80>
I think we will need more context for the log...
but I think there is something wrong with the GCP resource configuration of your autoscaler
Can you send the full autoscaler log and the configuration ?
Thanks!
Hmm from here : None
Could it be you do not have privileges to the resource, or that you did not provide credentials ?
Did that autoscaler work before ?
Hi @<1533620191232004096:profile|NuttyLobster9>base_task_factory
is a function that gets the node definition and returns a Task to be enqueued ,
pseudo code looks like:
def my_node_task_factory(node: PipelineController.Node) -> Task:
task = Task.create(...)
return task
Make sense ?
Hi @<1524560082761682944:profile|MammothParrot39>
By default you have the last 100 iterations there (not sure why you are only seeing the last 3), but this is configurable:
None
I wonder if I just need to join 2 docker-compose files to run everything in one session
Actually that could also work
But for reference, when I said IP i meant the actual host network IP not the 127.0.0.1 (which is the same as localhost)
Hi @<1610083503607648256:profile|DiminutiveToad80>
This depends on how you configure the agents in your clearm.conf
You can do https if user/pass are configured, and you can force SSH and it will auto-mount your host SSH folder into the container and use it.
None
[None](https://github.com/allegroai/clearml-agent/blob/0254279ed5987fbc69cebae245efaea33aec1ff2/docs/cl...
Thanks @<1634001106403069952:profile|DefeatedMole42>
A follow up, (1) how are you spinning the agent ? (2) could it be the docker image "ultralytics/yolov5" does not have Bash as entry point ?
you can force that with
@PipelineDecorator.component(return_values=['int'], cache=False,
task_type='training',
docker="ultralytics/yolov5",
docker_args="--entrypoint /bin/bash",
pa...
using the docker-compose file for the
clearml-serving
pipeline, do we also have to mount it somehow?
oh yes, you are correct the values are passed using environment variables (easier when using docker compose)
You can in addition add a mount from the host machine to a conf file,
volumes:
- ${PWD}/clearml.conf:/root/clearml.conf
wdyt?
When you login with user/pass in the UI the same "process" happens and you get a Token to work with, this is the same as secret/key
Since in both cases you provide credentials and get back access token, it should work
(This is of course only if you are setting user/pass manually and disabling pass_hashed
as you have)
Woo, what a doozy.
yeah those "broken" pip versions are making our life hard ...
Hi @<1545216070686609408:profile|EnthusiasticCow4>
Many of the dataset we work with are generated by SQL query.
The main question in these scenarios is, are those DB stable.
By that I mean, generally speaking DB serve applications, and from time to time they undergo migration (i.e. change in schema, more/less data etc).
The most stable way is to create a script that runs the SQL query, and creates a clearml dateset from it (that script becomes part of the Dataset, to have full tracta...
Hi BrightGoat74
So merging general purpose plotly plots is very hard (i.e. putting both on the same graph)
But if you report using logger.report_scatter(...) the UI will merge the ROC curves into the dame graph, wdyt?
https://clear.ml/docs/latest/docs/guides/reporting/scatter_hist_confusion_mat_reporting#2d-scatter-plots
I mean clone the Task in the UI (right click Clone), then go to the execution Tab, to the "installed packages" section, then click on Edit -> go to the torchvision http link, and replace it with torchvision == 0.7.0
and save.
Then right enqueue the Task (to the default queue) and see if the Agent can run it,
DeterminedToad86 Make sense ?
So what will you query ?
this is very odd, can you post the log?
Yes that was a tricky one, basically always blame pip 🙂
One machine (original parent)
agent.package_manager.type = pip
agent.package_manager.pip_version =
Which would not upgrade the pip and use the preinstalled Unpacking python-pip-whl (20.0.2-5ubuntu1.10)
The other one has:
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
and it instal...
One suggestion is to make sure all agents have the same configuration. Another is to add pip
into the "installed packages" section.
(Notice that in the next release we will specifically include it there, to avoid these kind of scenarios)
I think it is only in get_task
(and by default it is true)
I think query task does not filter the
Hi @<1657918706052763648:profile|SillyRobin38>
I'm curious to know if it's possible to prevent uploading a duplicate endpoint.
...and we attempt to upload it again without any changes to the command content,
Basically you overwrite it, and yes, possible 🙂
any other aspect, could the system prevent the duplicate upload?
so basically check the hash and say, no need to upload?
Thank you MuddyCrab47 !
Regrading model versioning:
All models are logged automatically by trains (no need so specify it, as long as you are using one of the automagically connected frameworks: PyTorch/keras/TF/SKlearn)
You can see see how it looks like on the demoapp:
https://demoapp.trains.allegro.ai/projects/5371015f43f043b1b4ad7203c1ff4a95/models
Regrading Dataset management, we have a simple workflow demonstrated below, bascially using artifacts as dataset storage, with very easy int...
BTW: 0.14.3 solved the issue you are referring to, so you can import trains before / parsing the args without an issue. Regrading passing project/name as parameters. A few thoughts: (1) you can always rename / move projects from the UI (2) If you are running it with trains-agent
there is no meaning to these arguments, as by definition the Task was already created... Maybe we should give an option to exclude a few arguments from argparser, I think this topic came up a few times... What d...
Hi PompousBeetle71 , what exactly is the scenario / problem we are trying to solve ?
Oh I see the pipeline controller itself (not the components) is the one with the repo
To fix that add at the top of the script the following:
` from clearml import Task
Task.force_store_standalone_script()
@PipelineDecorator.pipeline(...) `That should do the trick
Hi GreasyPenguin14
Could you tell me what the differences are and why we should use ClearML data?
The first difference is in the approach itself, DVC ties the data with the code (i.e. git repo), where we (ClearML - but not just us) actually think data should be abstracted from the Code-Base and become a standalone argument, allowing users to build/execute against different dataset/versions. ClearML Data becomes part of the workflow as it is visible from the UI including the abili...
PompousBeetle71 cool, next RC will have the argparse exclusion feature :)
Yep, and this is the root cause of the issue (But easily fixable) 🙂
okay thanks! let's pick it up on github 🙂