Reputation
Badges 1
25 × Eureka!Hi @<1547028116780617728:profile|TimelyRabbit96>
It should process the new request A (this is a multi threading / async implementation)
Is this consistent with what you are seeing ?
So, what I am referring to is the ability of a system to allow some rigor and robustness of tracking of experiments, and also enforcing some thoughts on how things might be deployed, early on in the development process, whilst not being overly prescriptive and cumbersome
I'm cannot agree more!!
VivaciousPenguin66 We are working on trying to better understand how to solve this very delicate act of balance and offer some sort of "JIRA" for ML.
If this is okay with you, once product pe...
Ohh... I would not delete them then ... 😞
Maybe kind of heuristics (files created a week ago can be deleted?!)
Hi WickedBee96
How can I do that?
clearml-task
https://clear.ml/docs/latest/docs/apps/clearml_task#what-is-clearml-task-for
I know this way to run it in the agent only by enqueue the draft after running it on my local machine so is there another way?
Or maybe are you looking for task.execute_remotely
https://clear.ml/docs/latest/docs/references/sdk/task#execute_remotely
DeliciousBluewhale87 basically any solution that is compliant with S3 protocol will work. An example:output_uri=" :PORT/bucket/folder"Are you sure Nexus supports this protocol ?
I "think" nexus sits on top of a storage solution (like am object storage), meaning we can use the same storage solution Nexus is using.
Just to clarify we do not support the artifactory protocol Nexus provides for storing models/artifacts. But we do support it as a source for python packages used by the a...
But only 1 node will copy it.
they can only copy it after the first is finished, and they are not aware it is trying to set the exact venv, hence the race
Could you extend on the use case of #18 ? how would you use it? what problem will it be solving ?
pipe.start_locally() will run the DAG compute part on the same machine, where pipe.start() will start it on a remote worker (if it is not already running on a remote worker)
basically "pipe.start()" executed via an agent, will start the compute (no overhead)
does that help?
my question is how to recover, must i recreate the agents or there is another way?
Yes you have to recreate the Task (I assume they failed, no?!)
It is the folder the clearml creates and the folder we create ourself to store the predictions
I see... If that is the case, the only solution I can think of is manually uploading the files with StorageManager(...) then get the url, and register it as debug_media or artifact:logger.report_media("image", "type a", iteration=iteration, url=" ") task.upload_artifact('a link', artifact_object=' ')
Sorry @<1524922424720625664:profile|TartLeopard58> 😞 we probably missed it
clearml-session is still being developed 🙂
Which issue are you referring to ?
if they're mission critical, but rather the clearml cache folder?
hmmm... they are important, but only when starting the process. any specific suggestion ?
(and they are deleted after the Task is done, so they are temp)
So how do I solve the problem? Should I just relaunch the agents? Because they can't execute jobs now
Are you running in docker mode ?
If so you can actually delete mapped files (they will still be available inside the docker), just make sure you delete them X hours after they were created, and you should be fine.
wdyt?
This is by design, they cannot use the exact same venv because if the code starts creating files/change them it happens inside the venv and might cause them to crash.
That said if you are running with venv cache, the first one will create the venv and the second one will create a copy from the cache.
and then?
The thing is programmatically this is not easy to do as API, because at the end the "function" (i.e. LCI) never leaves, it connects to the SSH and stays
But you can query the Task it creates, the project is known, the user is known and it is of special type/tag
Did you you set 'force_git_ssh_protocol: true '?
https://github.com/allegroai/clearml-agent/blob/249b51a31bee97d63f41c6d5542e657962008b68/docs/clearml.conf#L39
Let's try:
` echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && b...
What's the host you have in the clearml.conf ?
is it something like " http://localhost:8008 " ?
GreasyPenguin14 whats the clearml version you are using, OS & Python ?
Notice this happens on the "connect_configuration" that seems to be called after the Task was closed, could that be the case ?
Hi SucculentBeetle7
The parameters passed to add_step need to contain the section name (maybe we should warn if it is not there, I'll see if we can add it).
So maybe something like:{'Args/param1', 1}Or{'General/param1', 1}Can you verify it solves the issue?
I see, when you run it manually (i.e. not via an agent) what do you have under the configuration tab in the UI (meaning do you see both argparser arguments there)?
StorageHelper is used internally.
I'll make sure we remove it from the examples/docs
pass :task_filter=dict(system_tags=['-archived'])
My question was about the automatically uploaded models. Those that were uploaded by clearml client.
So there is a way to add a callback would that work?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/binding/frameworks/init.py#L137def callback(_, model_info): model_info.name = "my new name" return model_info
I think the real issue is that I am not able to specify a platform for the model,
None
there is no need to specify it, remove it from the config.pbtxt - the clearml-serving will automatically add the background
Hi JitteryCoyote63
If you want to refresh the task object, call task.reload() It will also refresh the artifacts.
The reason for not always do so when accessing the .artifacts objects is for speed optimization (It might be slow compared to dict access, and we assume users will expect it to behave the dict)
An easier fix for now will probably be some kind of warning to the user that a task is created but not connected
That is a good point, maybe if you do not have a "main" Task, then we print the warning (with some flag to disable the warning) ?
This is sitting on top of the serving engine itself, acting a s a control plane.
Integration with GKE is being worked on (basically KFServing as the serving engine)
Hi UnsightlyLion90
from my understanding agent do the job of SLURM,
That is kind of correct (they overlap in some ways 🙂 )
Any guide of how to integrate both of them?
The easiest way is to just add the "Task.init()" call to your code, and use SLURM to schedule the job. this will make sure all jobs are fully logged (this can also includes automatically uploading the models, and artifact support etc)
Full SLURM support (i.e. similar to the k8s glue support), is currently ou...
and then in Preprocess:
self.model = get_model(task_id=os.environ['TASK_ID'], model_name=os.environ['MODEL_NAME'])That's the part I do not get, Models have their own entity (with UID), this is in contrast to artifacts that are only stored on Tasks.
The idea when you are registering a model with clearml-serving, you can specify the model ID, this should replace the need for the TASK_ID+model_name in your code, and the clearml-serving will basically bring it to you
Basically this fun...