Reputation
Badges 1
25 × Eureka!SmarmySeaurchin8 what do you think?
https://github.com/allegroai/trains/issues/265#issuecomment-748543102
task.connect_configuration
Can you try to manually install it and see what you are getting?python3.10 -m pip install /home/boris/.clearml/pip-download-cache/cu117/torch-1.12.1+cu116-cp310-cp310-linux_x86_64.whl
Hi ColossalAnt7
Following on SuccessfulKoala55 answer
I saw that there is a config file where you can specify specific users and passwords, but it currently requires
- mount the configuration file (the one holding the user/pass) into the pod from a persistent volume .
I think the k8s way to do this would be to use mounted config maps and secrets.
You can use ConfigMaps to make sure the routing is always correct, then add a load-balancer (a.k.a a fixed IP) for the users a...
Hi ConvolutedSealion94
Just making sure, you spinned the docker-compose of the clearml serving as well ?
The issue I want to avoid is aborting of the dataset task that these regular tasks update.
HelpfulHare30 could you post a pseudo code of the dataset update ?
(My point is, I'm not sure the Dataset actually supports updating, as it need to reupload the previous delta snapshot). Wouldn't it be easier to add another child dataset and then use dataset.squash (like one would do in git) ?
Yes, which looks like a lot, but you only need to d that once.
Auto scheduler would make (1) redundant (as it would spin the instance up/down based on the jobs in the queue)
It is deployed on an on premise, secured network that has no access to the outside world.
Is it password protected or something of that nature?
Perhaps we could find a different solution or work around, rather than solving a technical issue.
Solving it means allowing the python code to ask the JupyterLab server for the notebook file
However, once working with ClearML and using a venv (and not the default python kernel),
Are you saying on your specific setup (i.e. OpenShif...
EnviousStarfish54 are those scalars reported ?
If they are, you can just do:task_reporting = Task.init(project_name='project', task_name='report') tasks = Task.get_tasks(project_name='project', task_name='partial_task_name_here') for t in tasks: t.get_last_scalar_metrics() task_reporting.get_logger().report_something
My use case is when I have a merge request for a model modification I need to provide several informations for our Quality Management System one is to show that the experiment is a success and the model has some improvement over the previous iteration.
Sounds likes good approach 🙂
Obviously I don't want the reviewer to see all ...
Maybe move publish the experiment and move it to a dedicated folder ? Then even if they see all other experiments, they are under "development" p...
Hi SubstantialElk6
I'm not sure what you are asking 🙂
Basically the clearml-agent
will pull a Task from an execution queue, and execute it (based on the definition on the Task, i.e. git repo, python packages docker image etc.)
It has to be alive so all the "child nodes" could report to it....
Yes, i basically plan to use ClearML as user-friendly cluster manager
and it is 🙂
I think the main "drawback" is that you cannot "reserve" nodes for the multi-node training. The easiest solution is to have high-priority queue that is never used, and then have the DDP master process push into the high priority queue, which will ensure these are the next Tasks to be executed (now the only thing that is missing is preemption to running Tasks, but this automation policy is unfortunate...
Hi FiercePenguin76
should return all datasets from all projects?
Correct 🙂
The idea is that it is not necessary, using the trains-agent you can not only launch the experiment on a remote machine, you can override the parameters, not just cmd line arguments, but any dictionary you connected with the Task or configuration...
Can you also share the full log? the numbers seem of (and clearml cannot actually "invent" those numbers they are coming from somewhere...)
This is very odd, can you also put here the file names? maybe an odd character is causing it?
Can you also test it with the latest clearml version (1.8.0) ?
BTW: this is probably more efficient than pickling
https://pandas.pydata.org/pandas-docs/version/1.1.5/reference/api/pandas.DataFrame.to_parquet.html
Regulatory reasons and proprietary data is what I had in mind. We have some projects that may need to be fully self hosted in the end
If this is the case then, yes do self-hosted, or talk to clearml sales to get the VPC option, but SaaS is just not the right option
I might take a look at it when I get a chance but I think I'd have to see if ClearML is a good fit for our use case before I can justify the commitment
I hope it is 🙂
Hi JitteryCoyote63 ,
These properties are usually not available on the UI and are used internal, hence the lack of documentation. Regrading parent
property, it will hold a parent Task.id (str) , that said it has no real effect on the Task itself. You can however search for Tasks with a specific parent ID (For examples, this is how the the hyper parameter class is using this property)
MelancholyChicken65 found it ! thank you for finding this issue.
I'm hoping to get an update soon 🙂
NaughtyFish36
what's the error you are getting?
Also did you try setting: force_git_ssh_protocol: true
?
https://github.com/allegroai/clearml-agent/blob/76c533a2e8e8e3403bfd25c94ba8000ae98857c1/docs/clearml.conf#L39
Hi SmarmyDolphin68
See some details here:
https://allegro.ai/docs/deploying_trains/trains_server_config/#network-and-security
Basically get an Azure load-balancer, it can also add the https on top of the http connect.
Check the details on load-balancers here
https://allegro.ai/docs/deploying_trains/trains_server_config/#sub-domains-and-load-balancers
I think this is the one:
https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-overview
EnviousStarfish54 you can use Use Task.set_credentials
Notice that OS environment or trains.conf will override the programmatic credentials
https://allegro.ai/docs/task.html#trains.task.Task.set_credentials
, is the team open to PRs from external people?
Yes please do! PRs are welcomed! I thought we fixed the GitHub readme to reflect it, anyhow I'll make sure we do 🙂
Now will these 10 experiments be of different names? How will I know these are part of the 'mnist1' HPO case?
Yes (they will have the specific HP name/value combination).
FYI names are not unique so in theory you could have multiple experiments with the same name.
If you look under the Configuration Tab, you will find all the configuration arguments for the experiment. You can also add specific arguments to the experiment table (click the cogwheel at the right top corner, and select...
Hi DeliciousBluewhale87
This is the latest clearml-serving (stable release at GTC at the end of the month)
https://github.com/allegroai/clearml-serving/tree/dev
Generally speaking, clearml-sering is a control plane, preprocessing, ML inference, with Nvidia Triton for DL inference (fully transparent).
It allows you to spin an entire fully dynamic & scalable serving on top of k8s cluster. Once you spin the base containers, you can configure them live with a CLI, this includes adding new en...