Reputation
Badges 1
25 × Eureka!is there a way to increase the size of the text input for fields or a better way to handle lists?
No π
Maybe an easier way to use connect_configuration instead ? it will take an entire dict and store it as text (format is hocon, which is YAML/Json compatible, which means it is hard to break when editing)
callbacks.append( tensorflow.keras.callbacks.TensorBoard( log_dir=str(log_dir), update_freq=tensorboard_config.get("update_freq", "epoch"), ) )Might be! what's the actual value you are passing there?
CurvedHedgehog15 there is not need for :task.connect_configuration( configuration=normalize_and_flat_config(hparams), name="Hyperparameters", )Hydra is automatically logged for you, no?!
It has to be alive so all the "child nodes" could report to it....
os.environ['TRAINS_PROC_MASTER_ID'] = '1:da0606f2e6fb40f692f5c885f807902a' os.environ['OMPI_COMM_WORLD_NODE_RANK'] = '1' task = Task.init(project_name="examples", task_name="Manual reporting") print(type(task))Should be: <class 'trains.task.Task'>
Notice that you have to Have the task already started by the Master process
BoredHedgehog47 can you test this one? Is it close to your code ?
ResponsiveHedgehong88 so I would suggest using execute_remotely in your code, basically you start locally you make sure everything is passed as intended, then from within the code you call task.execute_remotely(...) which will stop the current process and enqueue the Task on the selected queue for the agent to execute.
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/examples/advanced/execute_remotely_example.py#L127
This way you can both easily test...
Okay here is a standalone code that should be close enough? (if I missed anything let me know)
` import tempfile
from datetime import datetime
from pathlib import Path
import tensorflow as tf
import tensorflow_datasets as tfds
from clearml import Task
task = Task.init(project_name="debug", task_name="test")
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
def normalize_img(image, labe...
I still have name
my_name
, but the project name
my_project/.datasets/my_name
rather than
my_project/.datasets
Yes, this is the expected behavior
And I don't see any new projects / subprojects where that dataset creation Task is stored
They are marked "hidden" hence by default you cannot see them in the UI (so they will only appear in the Dataset page),
you can turn the UI hidden flag by going to your settings page and selecting "Con...
BoredHedgehog47 I tried changing the order of imports on the sample code I shared before, it worked in both cases ...
yes you are correct, OS environment:TRAINS_PROC_MASTER_ID=1:task_id_here
Maybe before everything else, can you share some background on the rational if starting a new sub process?
DistressedGoat23
you can now access the weights model objectpip install 1.8.1rc0
then:
` def callback(_, model_info):
model_info.weights_object # this is your xgboost object
model_info.name = "my new name"
return model_info
WeightsFileHandler.add_pre_callback(callback) `
I put two models in the same endpoint, then only one was running,
without providing version number, you are overriding the models (because this is the same endpoint)
I started another docker container having a different port number and then the curls with the new model endpoint (with the new port) started working
Seems like misconfiguration on the first one?
, which apparently I can't specify when I establish the model endpoint but I need to re compose the docker container by...
Hi @<1655744373268156416:profile|StickyShrimp60>
The best way is through APIs, you can query all the Tasks and then one by one use task.export_task with task.get_reported_scalars , task.get_reported_plots, task.get_reported_console_output, to get the details, after that you can recreate thee Task with import_task, and manually report the scalars/plots/console
btw: is self hosted server cheaper than the 15$ a month hos...
I see what you mean.an_optimizer = HyperParameterOptimizer( base_task_id='39d2c27baa8145929b2e21f686a17046', hyper_parameters=[], objective_metric_title='epoch_accuracy', objective_metric_series='epoch_accuracy', objective_metric_sign='max', optimizer_class=aSearchStrategy, max_iteration_per_job=0, total_max_jobs=0, auto_connect_task=False, ) print(an_optimizer.get_top_experiments(top_k=5))
Yes this seems like it is stuck, could you test with the demo server ?
(basically remove the clearml.conf it will connect automatically)
Hi LackadaisicalOtter14
However, whenever we spin up a session,Β
Β always gets run and overwrites our configs
what do you mean by that?
The what config are being overwritten? (generally speaking, it just add the OS environment it needs to for the setup process)
So I might be a bit out of sync, but I think there should be Triton serving and OpenVino serving built into it (or at least in progress).
Hi MortifiedCrow63
I have to admit this is very strange, I think the fact it works for the artifacts and not for the model is kind of a fluke ...
If you use "wait_on_upload" argument in the upload_artifact you end up with the same behavior. Even if uploaded in the background, the issue is still there, for me it was revealed the minute I limited the upload bandwidth to under 300kbps.It seems the internal GS timeout assumes every chunk should be uploaded in under 60 seconds.
The default chunk...
Hi @<1684010629741940736:profile|NonsensicalSparrow35>
So sorry I missed this thread π
Basically your issue is the load balancer that prevents the post command, you can change that, just add to any clearml.conf the following line:
api.http.default_method: "put"
Hmm so I guess the actual code adds it into the reporting itself ...
How about we call:task.set_initial_iteration(0)
LazyFish41 just making sure, you built a container from the docker file, and used it as base docker image for the Task, is that correct ?
Also notice the cleaml-agent will not change the entry point of the docker meaning if the entry point does not end with plain bash, it will not actually run anything
GrievingTurkey78 sure, aws autoscaler can do that:
https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py
Notice you have in the Path:/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py/sfiBut you should have:/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py/
BTW: trains-agent is leaner, and does not need plotly. And you can use the APIClient to basically query the entire system, would that be a better solution? See https://github.com/allegroai/trains-agent/blob/master/examples/archive_experiments.py