Reputation
Badges 1
25 × Eureka!Hi @<1695969549783928832:profile|ObedientTurkey46>
Use --services-mode in the agent , it will run many Tasks on the same machine, this is usually associated with the services queue, but can be run on any queue. This way you could have the same machine easily running those multiple "control" tasks.
wdyt?
Hi @<1524922424720625664:profile|TartLeopard58>
Yes this is the default it is designed to serve multiple models and scale horizontally
Hi @<1547028116780617728:profile|TimelyRabbit96>
Start with the simple scikit learn example
https://github.com/allegroai/clearml-serving/tree/main/examples/sklearn
The pipeline example is more complicated, it needs the base endpoints, start simple π
the use case i have is to allow people from my team to run their workloads on set of servers without stepping over each other..
So does that mean CPU only workloads?
Also are we afraid of fairness? (i.e. someone "taking" all the CPU for themselves)
For now I come to the conclusion, that keeping aΒ
requirements.txt
Β and making clearml parse
Maybe we could just have that as another option?
Could it be it checks the root target folder and you do not have permissions there only on subfolders?
I guess it wonβt due to the nature of services?
Correct, k8s glue works differently, that said I would actually use the helm to spin a pod woth the agent in services mode and venv mode.
Hi ConvolutedBee40
If we deploy a task to
clearml-server
, will it automatically scale?
The way it works is with agents and agent glue, basically using k8s as a resource allocator and the clearml agent as orchestrator, did that answer the question ?
No -- that section is blank,
This is the main issue, it should be filled with the requirement being auto detected.
The entire script was executed from within vscode, and the Task was created but it was not prefilled with anything ?
Just making sure, you called Task.init inside your code ?
(with older clearml versions thoughβ¦).
Yes, we added content type header for the files when uploading to S3 (so it is easier for users to serve them back). But it seems the python 3.5 casting from Path to str breaks it mimetype call....
It seems the code is trying to access an s3 bucket, could that be the case? PanickyMoth78 any chance you can post the full execution log? (Feel free to DM so it won't end up being public)
StorageManager
Oh it has no remove πStorageHelper.delete is the only way
Using the dataset.create command and the subsequent add_files, and upload commands I can see the upload action as an experiment but the data is not seen in the Datasets webpage.
ScantCrab97 it might be that you need the latest clearml package installed on the client end (as well as the new server with the UI)
What is your clearml package version ?
My question is what happens if I launch in parallel multiple doit commands that create new Tasks.
Should work out of the box.
I would like to confirm that current_task ...
Correct.
Hi LivelyLion31 I missed your S3 question, apologies. What did you guys end up doing?
BTW you could always upload the entire TB log folder as artifact, it's simple task.upload_artifact('tensorboard', './tblogsfolder')
HandsomeCrow5 if you want to edit the Task object you can just use:internal_task_representation = task.data internal_task_representation.execution.script = ... task._edit(execution=internal_task_representation.execution)This will make sure you do not need to worry about API version etc. the Task object will take care of it.
BTW: it seems a few more people wanted this ability, maybe we should edit a proper .edit method to Task. Thoughts ?
In any case, do you have any suggestion of how I could at least hack tqdm to make it behave? Thanks
I think I know what the issue is, it seems tqdm is using Unicode for the CR this is the 1b 5b 41 sequence I see on the binary log.
Let me see if I can hack something for you to test π
Actually if you can send the full log of the Task that would be great
One additional question, if you import clearml after you call torch does it work ?
Okay I found it, this is due to the fact the newer versions are sending the events/images in a subprocess (it used to be a thread).
The creation of the object is done on he main process, updating file index (round robin manner), but the check itself, happens on the subprocess., which is not "aware" of the used indexes (i.e. it is always 0, hence when exceeding the history side, it skips it)
Can I make the Tasks that I'm adding to the pipeline also run locally, such that the entire pipeline runs locally?
Ohh I think only if you have an agent running on your machine.
What is the use case ? (maybe we can add local execution as well?!)
Hmm that is odd, can you send an email to support@clear.ml ?
i.e. runpip install --upgrade trains
Hi @<1523702932069945344:profile|CheerfulGorilla72>
I think more details re needed here:)