Reputation
Badges 1
25 × Eureka!Yep... they are pushing "heavy" users away from these instances. Nothing really you can do, maybe switch to Azure/GCP, but it might be the same there
I prefer serving my models in-house and only performing the monitoring via ClearML.
clearml-serving
is an infrastructure for you to run models π
to clarify, clearml-serving
is running on your end (meaning this is not SaaS where a 3rd party is running the model)
By the way, I saw there is a project dashboard app which might support the visualization I am looking for. Is it suitable for such use case?
Hmm interesting, actually it might, it does collect matrices over time ...
JitteryCoyote63 What did you have in mind?
Okay how do I reproduce it ?
Hi @<1649221394904387584:profile|RattySparrow90>
: Are the models I defined to be served e.g. via the CLI downloaded to the serving pod
Yes this is done automatically and online (i.e. when you update the using CLI/API) , based on the models/endpoints you set
So that they are physically lying there as a file I can see in the filesystem?
They are, and cached there
Or is it more the case that the pod gets the model when needed/when an API call for this model is incoming?
I...
but instead, they cannot be run if the files they produce, were not committed.
The thing with git, if you have new files and you did not add them, they will not appear in the git diff, hence missing when running from the agent. Does that sound like your case?
Hi JitteryCoyote63
Wait a few hours, there is a new fix, I'll make sure we upload it later today (scheduled to be there anyhow, I'll push it forward)
(some packages that are not inside the cache seem to have be missing and then everything fails)
How did that happen?
However, it's very interesting why ability to cache the step impacts artifacts behavior
From you log:
videos_df = StorageManager.download_file(videos_df)
Seems like "videos_df" is the DataFrame, why are you trying to download the DataFrame ? I would expect to try and download the pandas file, not a DataFrame object
It's the safest way to run multiple processes and make sure they are cleaned afterwards ...
but the logger info is missing.
What do you mean? Can I reproduce it ?
BTW: The code sample you shared is very similar to how you create pipelines in ClearML, no?
(also could you expand on how you create the Kedro node ? from te face o fit it looks like another function in the repo, but I have a feeling I'm missing something)
So the thing is, regardless of the link you should end with:helper <clearml.storage.helper.StorageHelper object at 0x....>
But the code that failed seemed to return None, which makes me suspect the url itself is somehow broken.
Any chance you have a space before the "s3://" ?
BTW : what's the clearml version you are using ?
If you want to quickly test it:pip install clearml-agent
Then assuming Task id is aabbcc
Runclearml-agent execute --id aabbcc
You should be able to trace if the package was installed
The main issue is applying the patch requires git clone and that would fail on local (not pushed) commits.
What's the use case itself ?
(btw, if you copy the uncommitted changed into a file and git apply it, it will work)
If you one each "main" process as a single experiment, just don't call Task.init in the scheduler
Hi ConfusedPig65
Any keras model will be automatically uploaded if you pass an upload url to the Task init:task = Task.init('examples', 'keras upload test', output_uri="
")
(You can also pass to output_uri s3://buckket/folder or change the default output_uri in the clearml.conf file)
After this line any keras model will be automatically uploaded (you will see it under the Artifacts Tab)
Accessing models from executed tasks:
` trains_task = Task.get_task('task_uid_here')
last_check...
Hi JitteryCoyote63 ,
The easiest would probably be to list the experiment folder, and delete its content.
I might be missing a few things but the general gist should be:from trains.storage import StorageHelper h = StorageHelper('s3://my_bucket') files = h.list(prefix='s3://my_bucket/task_project/task_name.task_id') for f in files: h.delete(f)
Obviously you should have the right credentials π
Β are models technicallyΒ
Task
s and can they be treated as such? If not, how to delete a model permanently (both from the server and from AWS storage)?
When you call Task.delete() it actually goes over a;; the models/artifacts and deletes them from the storage
Okay, I think I understand, but missing something. It seems you call get_parameters from old API , is your code actually calling get_parameters ? The trains-agent runs the code externally, whatever happens inside the agent should have now effect on the code. So who exactly is calling the task.get_parameters, and well, why ? :)
Hmm GreasyLeopard35 can you specify the range you are passing to the HPO, as well as the type of optimization class ? (grid/random/optuna etc.)
report_text does not, this is very weird
Okay this seems to be the issue.
Just making sure the Task status is "running" and task.get_logger().report_text("something")
does not report a thing ?
Do you see it on your screen?
Can you test without the "Task.debug_simulate_remote_task / init" ?
It might be that the worker was killed before unregistered, you will see it there but the last update will be stuck (after 10min it will be automatically removed)
Hi DeliciousBluewhale87
clearml-agent 0.17.2 was just release with the fix, let me know if it works
I'm guessing the extra index URL can be a URL to the github repo of interest?
The extra index URL is exactly what you would be passing to pip install, meaning it has to comply to pypi artifactory api.
Make sense ?
BTW MagnificentSeaurchin79 just making sure here:
but I don't see the loss plot in scalars
This is only with Detect API ?