Reputation
Badges 1
25 × Eureka!BoredHedgehog47 can you test this one? Is it close to your code ?
ResponsiveHedgehong88 so I would suggest using execute_remotely in your code, basically you start locally you make sure everything is passed as intended, then from within the code you call task.execute_remotely(...) which will stop the current process and enqueue the Task on the selected queue for the agent to execute.
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/examples/advanced/execute_remotely_example.py#L127
This way you can both easily test...
Okay here is a standalone code that should be close enough? (if I missed anything let me know)
` import tempfile
from datetime import datetime
from pathlib import Path
import tensorflow as tf
import tensorflow_datasets as tfds
from clearml import Task
task = Task.init(project_name="debug", task_name="test")
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
def normalize_img(image, labe...
I still have name
my_name
, but the project name
my_project/.datasets/my_name
rather than
my_project/.datasets
Yes, this is the expected behavior
And I don't see any new projects / subprojects where that dataset creation Task is stored
They are marked "hidden" hence by default you cannot see them in the UI (so they will only appear in the Dataset page),
you can turn the UI hidden flag by going to your settings page and selecting "Con...
BoredHedgehog47 I tried changing the order of imports on the sample code I shared before, it worked in both cases ...
yes you are correct, OS environment:TRAINS_PROC_MASTER_ID=1:task_id_here
Maybe before everything else, can you share some background on the rational if starting a new sub process?
DistressedGoat23
you can now access the weights model objectpip install 1.8.1rc0
then:
` def callback(_, model_info):
model_info.weights_object # this is your xgboost object
model_info.name = "my new name"
return model_info
WeightsFileHandler.add_pre_callback(callback) `
I put two models in the same endpoint, then only one was running,
without providing version number, you are overriding the models (because this is the same endpoint)
I started another docker container having a different port number and then the curls with the new model endpoint (with the new port) started working
Seems like misconfiguration on the first one?
, which apparently I can't specify when I establish the model endpoint but I need to re compose the docker container by...
Hi @<1655744373268156416:profile|StickyShrimp60>
The best way is through APIs, you can query all the Tasks and then one by one use task.export_task with task.get_reported_scalars , task.get_reported_plots, task.get_reported_console_output, to get the details, after that you can recreate thee Task with import_task, and manually report the scalars/plots/console
btw: is self hosted server cheaper than the 15$ a month hos...
I see what you mean.an_optimizer = HyperParameterOptimizer( base_task_id='39d2c27baa8145929b2e21f686a17046', hyper_parameters=[], objective_metric_title='epoch_accuracy', objective_metric_series='epoch_accuracy', objective_metric_sign='max', optimizer_class=aSearchStrategy, max_iteration_per_job=0, total_max_jobs=0, auto_connect_task=False, ) print(an_optimizer.get_top_experiments(top_k=5))
Yes this seems like it is stuck, could you test with the demo server ?
(basically remove the clearml.conf it will connect automatically)
Hi LackadaisicalOtter14
However, whenever we spin up a session,Β
Β always gets run and overwrites our configs
what do you mean by that?
The what config are being overwritten? (generally speaking, it just add the OS environment it needs to for the setup process)
So I might be a bit out of sync, but I think there should be Triton serving and OpenVino serving built into it (or at least in progress).
Hi MortifiedCrow63
I have to admit this is very strange, I think the fact it works for the artifacts and not for the model is kind of a fluke ...
If you use "wait_on_upload" argument in the upload_artifact you end up with the same behavior. Even if uploaded in the background, the issue is still there, for me it was revealed the minute I limited the upload bandwidth to under 300kbps.It seems the internal GS timeout assumes every chunk should be uploaded in under 60 seconds.
The default chunk...
Hi @<1684010629741940736:profile|NonsensicalSparrow35>
So sorry I missed this thread π
Basically your issue is the load balancer that prevents the post command, you can change that, just add to any clearml.conf the following line:
api.http.default_method: "put"
Hmm so I guess the actual code adds it into the reporting itself ...
How about we call:task.set_initial_iteration(0)
LazyFish41 just making sure, you built a container from the docker file, and used it as base docker image for the Task, is that correct ?
Also notice the cleaml-agent will not change the entry point of the docker meaning if the entry point does not end with plain bash, it will not actually run anything
GrievingTurkey78 sure, aws autoscaler can do that:
https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py
Notice you have in the Path:/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py/sfiBut you should have:/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py/
BTW: trains-agent is leaner, and does not need plotly. And you can use the APIClient to basically query the entire system, would that be a better solution? See https://github.com/allegroai/trains-agent/blob/master/examples/archive_experiments.py
To store all the debug samples, also it can store all the models (if you configure the output_uri=' http://file_server_here:8081 ') Yes: instead of the file server have 's3://<ip_of_minio>:9000/bucket' make sure you add the credentials for the minio in the trains.conf Yes, basically once you have the creendtials in the trains.conf, you could do StorageManager.get_local_copy('s3://<minio>:9000/bucket/file') (also upload of course π )
LudicrousParrot69 there is already
Task.add_tags
https://github.com/allegroai/clearml/blob/2d561bf4b3598b61525511a1a5f72a9dba74953e/clearml/task.py#L964
I theory this would be doable, but wouldn't it be a bit confusing? Also why not always use containers if the host supports it, there is no real downside, just set the default docker image to something that is a good starting point
Notice that you need to pass the returned scroll_id to the next call
scroll_id = response["scroll_id"]
GrievingTurkey78 can you send the entire log?
BoredHedgehog47 you need to make sure "<path here>/train.py" also calls Task.init (again no need to worry about calling it twice with different project/name)
The Task.init call will make sure the auto-connect works.
BTW: if you do os.fork , then there is no need for the Task.init, the main difference is that POpen starts a whole new process, and we need to make sure the newly created process is auto-connected as well (i.e. calling Task.init)
Hi TenderCoyote78
I'm trying to clearml-agent in my dockerfile,
I'm not sure I'm following, Are you traying to create a docker container containing the agent inside? for what purpose ?
(notice that the agent can spin any off the shelf container, there is no need to add the agent into the container it will take of itself when it is running it)
Specifically to your docker file:
RUN curl -sSL
| sh
No need for this line
COPY clearml.conf ~/clearml.conf
Try the ab...
Hi @<1636175432829112320:profile|PlainSealion45>
I am trying to automatically generate an online endpoint for inference when manually adding tag
released
to a model.
So the "automatic" here means that the model endpoint will be updated with the latest model, but not that a new endpoint will be created.
Does that make sense ?
To add a new endpoint on Tagging a model, you should combine it with ModelTrigger and have a fucntion that calls the clearml-serving to cr...