Reputation
Badges 1
25 × Eureka!I am running from noebook and cell has returned
Well the Task will close when you shut down the notebook π
I notice that, in my Serving Service situated in the DevOps project, the "endpoints" section doesn't seem to get updated when I tag a new model with "released".
It takes it a few minutes (I think 5 min is the default) to update.
Notice that you need to add the model with
model auto-update --engine triton --endpoint "test_model_pytorch_auto" ...
Not with model add (if for some reason that does not work please let me know)
No need to pass the model version i.e. 1
you can ...
There was an issue in some versions where seeborn plots were blank. Is that the case?
GiganticTurtle0 My apologies, I made a mistake, this will not work π
In the example above "step_two" is executed "instantaneously" , meaning it is just launching the remote task, it is not actually waiting for it.
This means an exception will not be raised in the "correct" context (actually it will be raised in a background thread).
That means that I think we have to have a callback function, otherwise there is no actual way to catch the failed pipeline task.
Maybe the only re...
I think the main risk is ClearML upgrades to MongoDB vX.Y, and mongo changed the API (which they did because of amazon), and now the API call (aka the mongo driver) stops working.
Long story short, I would not recommend it π
The second run prints out the same (non) "random" numbers as the first run
ClearML sets the initial random seed for you, basically trying to help with reproducibility. That said inside the function you can always do:import random import time random.seed(time.time())
Hi ScaryKoala63
Which versions are you using (clearml / lightning) ?
ConvolutedChicken69
basically the cleamrl-data needs to store an immutable copy of the delta changes per version, if the files are already uploaded, there is a good chance they could be modified...
So in order to make sure we you have a clean immutable copy, it will always upload the data (notice it also packages everything into a single zip file, so it is easy to manage).
Hi @<1523704757024198656:profile|MysteriousWalrus11>
"parents": [
"step_two",
"step_four"
],
Seems like step 5 depends on steps 2+4 , how did you create it? what did the console say ?
Could it be your not actually passing any output from step3 ? how is it dependent on it ?
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L86
you can just pass the instance of the OptunaOptimizer, you created, and continue the study
Task.current_task().connect(training_args, name='hugggingface args')
And you should be able to change them when launching remotely π
SmallDeer34 btw: "set_parameters_as_dict" will replace all the arguments (and is one way) ...
Hi PanickyMoth78
Hmm it I think it might be that it overrides it with the environment variables it sets ...
optional one, add:sdk.development.default_output_uri: "
"
https://github.com/allegroai/clearml-agent/blob/d96b8ff9068233103053bfe8305fb88274c2c9bf/docs/clearml.conf#L404
Option two (which should work as well):environment { CLEARML_FILES_HOST: "
" }
https://github.com/allegroai/clearml-agent/blob/d96b8ff9068233103053bfe8305fb88274c2c9bf/docs/clearml.conf#L421
Agent works when I am running it from virtual environment but stucks in the same place all the time when I using Docker
Can you please provide a log? I'm not sure what it means stuck
Thanks you for noticing the issue!
I have to specify the full uri path ?
No it should be something like " s3://bucket "
the model files management is not fully managed like for the datasets ?
They are π
Basically it is the same as "report_scatter2d"
Ok, just my ignorance then?Β
LOL, no it is just that with a single discrete parameter the strategy makes less sense π
HI QuizzicalDove0
I guess the reason is that the idea is integration is literally 2 lines, and it will take less time to execute the code on a system with working env (we assume there is one) then to configure all the git , python packages, arguments etc...
All that said you can create an experiment from code , using Task.import_task https://allegro.ai/docs/task.html#trains.task.Task.import_task
RipeWhale0 are you taking them from here?
https://artifacthub.io/packages/helm/allegroai/clearml
These are maybe good features to include in ClearML:
or
.
Sure, we should probably add a section into the doc explaining how to do that
Other approach is creating my own API on the top of clearml-serving endpoints and there I control each tenant authentication.
I have to admit that to me this is a much better solution (then my/bento integrated JWT option). Generally speaking I think this is the best approach, it separates authentication layer from execution ...
Is there a way to document these non-standard entry points
@<1541954607595393024:profile|BattyCrocodile47> you should see the "run" in the Args section under Configuration
in case of HF you should see "-m huggingface" and then the rest in the Args section
(if this does not work, then I assume this is a bug π )
The idea is of course that you can always enqueue and reproduce, so if that part is broken we should fix it π
First I would check the CLI command it will basically prefill it for you:
https://clear.ml/docs/latest/docs/apps/clearml_task
Specifically to your question, working directory "." is the root of the git repo
But I would avoid adding it manually, use the CLI, it will either use ask you to provide info or take the git repo details from the local copy
Hi AttractiveShrimp45
Well, I would use the Task.connect
to add a section with any configuration your are using. for exampleTask.current_task().connect(my_dict_with_conf_for_data, name="dataset51")
wdyt?
DeliciousBluewhale87 great we have progress, this look slike it is inheriting from the system packages:
For example you can see in the log,Requirement already satisfied: future>=0.16.0 in /usr/local/lib/python3.6/dist-packages
Now the question is which docker it is running, because as you can see at the bottom of the log, tensorflow is not listed as installed, but other packages installed inside the docker are listed.
wdyt?
Hmm this is odd in deed, let me verify (thanks! @<1643060801088524288:profile|HarebrainedOstrich43> )
I'm running agent inside docker.
So this means venv mode...
Unfortunately, right now I can not attach the logs, I will attach them a little later.
No worries, feel free to DM them if you feel this is to much to post them here
CloudyHamster42 you mean that when you set sdk.metrics.tensorboard_single_series_per_graph
to True and you rerun the experiment, you are still getting multiple series on the same graph?
What's your Trains version?