Thanks GreasyPenguin66
How about:!curlBTW, no need to rebuild the docker, next time you can always do !apt update && apt install -y <package here> π
Try to add '--network host' to the docker args on the task you are launching
Hi SubstantialElk6
quick update, once clearml 1.1 is out, we will push the clearml-data improvement, supporting chunks per version (i.e. packaging the changeset into multiple zip files, instead of a single one as the current version does).
regrading (1) storage limit server.
Ideally, we should be able to specify the batch size that we want to download, or even better, tie this in with the training by parallelising the data download, data preprocessing and batch trains.
With the nex...
Okay, so the idea behind the new decorator is not to group all the defined steps under the same script so that they share the same environment, but rather to simplify the process of creating scripts for each step and avoid manually callingΒ
Task.init
Β on those scripts.
Correct, and allow users to more easily create Tasks from code.
Regarding virtual environment creation from caching, I will keep running benchmarks (from what you say it might be due to high workload ...
No idea, I just remember it is relatively old π
Hi @<1846360404628869120:profile|HelpfulBadger74>
Is pixi a drop in replacement for pip? is it like UV?
Is
mark_completed
used to complete a task from a different process and
close
from the same process - is that the idea?
Yes
However, when I tried them out,
mark_completed
terminated the process that called
mark_completed
.
Yes if you are changing the state of the Task externally or internally the SDK will kill the process. If you are calling task.close() from the process that created the Task it will gra...
Did you run clearml-init after the pip install ?
This will fix it, the issue is the "no default value" that breaks the casting@PipelineDecorator.component(cache=False) def step_one(my_arg=""):
Hmm, you are correct
Which means this is some conda issue, basically when installing from env file, conda is not resolving the correct pytorch version π
Not sure why... Could you try to upgrade conda ?
how can you be snyk and lower than 0.96
Yep Snyk auto "patching" is great π
as I mentioned wait for the GH sync tomorrow, a few more things are missing there
In the meantime you can just do ">= 0.109.1"
WittyOwl57
To get task Id's use (e.g. all the tasks of a specific project):task_ids = Task.query_tasks(project_name="examples", task_filter={'status': ["completed"])Then per task:
` for t_id in tasks_id:
t = Task.get_task(t_id)
conf_dict = t.get_configuration_as_dict(name="filter")
task_param = t.get_parameters()
task_param['filter'] = conf_dict
# this is to enable to forcefully update parameters post execution
t.mark_started(force=True)
# update hyper-parame...
WickedGoat98 sorry, I missed the thread...
that the trains.conf has to be located on the node running the trains-agent.
Correct π
The easiest way to check is to see if you can curl to the ip:port from the docker.
If you fail it is probably the wrong IP.
the IP you need to use is the IP of the machine running the docker-compose (not the IP of the docker inside that machine).
Make sense ?
Hi PunyGoose16 ,
I think the website is probably the easiest π
https://clear.ml/contact-us/
I think they get back to quite quickly
Can you send the full log? This is odd, it will by default use the python executable it (the agent) is running with.
Regardless you can specify the python executable to be used here:
https://github.com/allegroai/clearml-agent/blob/bd411a19843fbb1e063b131e830a4515233bdf04/docs/clearml.conf#L44
DeliciousBluewhale87 Is it ML or DL serving you are after ?
Hi @<1523703472304689152:profile|UpsetTurkey67>
You mean https://github.com/Lightning-AI/torchmetrics
?
Where are those stored?
Specifically for this one, this is the auto generated docstring from the actual code, so PR to the
https://github.com/allegroai/clearml/blob/e53a76b713910adaf87578c69e86f8154d4ab4c1/clearml/logger.py#L152
What sort of data would be stored in the
venvs-build
folder?
ClumsyElephant70 temporary (lifetime of the task execution) virtual environment, including the code etc. It is deleted and recreated for every new task launched (or restored from cache, if venvs_cache is enabled)
Based on the log you have shared:OSError: [Errno 28] No space left on deviceI would increase the storage ?
https://github.community/t/github-actions-failing-with-errno-28-no-space-left-on-device/18164/10
https://stackoverflow.com/questions/70175977/multiprocessing-no-space-left-on-device
https://groups.google.com/g/ansible-project/c/4U6MyvyvthQ
I would start by increasing the size of the TMPDIR folder
Hi SubstantialBaldeagle49
2. Sure follow the back procedure and restore on the new server
3. Yes
task=Task.get_task(task_id='aa')
task.get_logger().report_scalar()
Ohh I see now, okay there are two entries on an artifact, the actual artifact (link to file somewhere) and the text preview of the artifact . I think the "preview" is the issue
How does ClearML select reference branch? Could it be that ClearML only checks "origin" branch?
Yes π I think we can quickly fix that, I'm just trying to realize if there are down sides to running "git ls-remote --get-url" without origin
Basic setup:
glues service per "job template" (e.g. k8s resources, for example cpu requirement, or gpu requirement).
queue per glue service, e.g. cpu_machine queue, and 1xGPU queue
wdyt?
Hi @<1523719753099644928:profile|ImmenseMole52>
but tasks of this pipeline dont inherit docker and packages, why? I want to build or pull one docker and env for all pipeline steps only once, so ow can i do it?
you have to specify the docker image for the pipeline Tasks, by default it will not assume it is the same as the pipeline controller, basically just pass:
pipe.add_function_step(
name="load_data",
function=load_data,
function_kwargs={"config": conf...
Hi @<1547028116780617728:profile|TimelyRabbit96>
It should process the new request A (this is a multi threading / async implementation)
Is this consistent with what you are seeing ?
off the top of my head, the self hosted is missing the autoscalers (there is an AWS CLI, but no UI or others), also missing a the HPO UI feature,
but you should just check the detailed table here: None
Notice the error code:Action failed <400/401: tasks.create/v1.0 (Invalid project id: id=first_attempt)>If that is the case, The project ID is incorrect (project id is not the project name)
The pipeline stores the state of it's previous run, specifically the executed steps.
In our case the executed step was reset (I assume) so it cannot find the output model you are referring to, hence crashing
CleanPigeon16 make sense ?