Reputation
Badges 1
25 × Eureka!So as you say, it seems hydra kills these
Hmm let me check in the code, maybe we can somehow hook into it
AttractiveCockroach17 can I assume you are working with the hydra local launcher ?
I understand I can change the docker image for a component in the pipeline, but for the
it isnβt possible.
you can always to Task.current_task.connect()
from the pipeline function itself, to connect more configuration arguments you basically add via the function itself, all the pipeline logic function arguments become pipeline arguments, it's kind of neat π regrading docker, the idea is that you use a very basic python docker (the default for services) queue for all...
I was expecting the remote experiment to behave similarly, why do I need to import pandas there?
The only problem os that the remote code did not install pandas
, once the package is there we can read the artifacts
(this is in contrast to the local machine where pandas is installed and so we can create/read the object)
Does that make sense ?
wouldn't it be possible to store this information in the clearml server so that it can be implicitly added to the requirements?
I think you are correct, and if we detect that we are using pandas to upload an artifact, we should try and make sure it is listed in the requirements
(obviously this is easier said than done)
And if instead I want to force "get()" to return me the path (e.g. I want to read the csv with a library that is not pandas) do we have an option for that?
Yes, c...
We do upload the final model manually.
wait you said upload manually, and now you are saying "saved automatically", I'm confused.
Using the dataset.create command and the subsequent add_files, and upload commands I can see the upload action as an experiment but the data is not seen in the Datasets webpage.
ScantCrab97 it might be that you need the latest clearml
package installed on the client end (as well as the new server with the UI)
What is your clearml package version ?
SolidSealion72 this makes sense, clearml deletes artifacts/models after they are uploaded, so I have to assume these are torch internal files
HurtWoodpecker30 could it be you hit a limit of some sort ?
Hmm so yes that is true, if you are changing the bucket values you will have to manually also adjust it in grafana. I wonder if there is a shortcut here, the data is stored in Prometheus, and I would rather try to avoid deleting old data, Wdyt?
ComfortableShark77 it seems the clearml-serving is trying to Upload data to a different server (not download the model)
I'm assuming this has to do with the CLEARML_FILES_HOST, and missing credentials. It has nothing to do with downloading the model (that as you posted, will be from the s3 bucket).
Does that make sense ?
GiganticTurtle0
I think that what you are looking for is:param_dict = {'key': 1234} task.connect(param_dict, name='general')
Notice that when this code runs manually (i.e. not by the agent), the dict is stored on "general" parameter section of the Task.
But when the code is executed by the Agent, the opposite happens and the parameters from the "general" section of the Task or put back into the param_dict
, here the casting is done based on the type of the original values.
Generall...
Then in theory (since the backend is python based) you just need to find a base docker image to build it on.
The log is missing, but the Kedro logger is print toΒ sys.stdout in my local terminal.
I think the issue night be it starts a new subprocess, and that subprocess is not "patched" to capture the console output.
That said if an agent is running the entire pipeline, then everything is logged from the outside, so whatever is written to stdout/stderr is captured.
Thanks you for noticing the issue!
MagnificentPig49 that's a good question, I'll ask the guys π
BTW, I think the main issues is actually making sure there is enough documentation on how to compile it...
Anyhow I'll update here
For example, for some of our models we create pdf reports, that we save in a folder in the NFS disk
Oh, why not as artifacts ? at least you will be able to access from the web UI, and avoid VFS credential hell π
Regrading clearml datasets:
https://www.youtube.com/watch?v=S2pz9jn26uI
Because it lives behind a VPN and github workers donβt have access to it
makes sense
If this is the case, I have to admit that combining offline-mode and remote execution makes sense, no?
3.a
Regarding the model query, sure from Python or restapi you can query based on any metadata
https://clear.ml/docs/latest/docs/references/sdk/model_model/#modelquery_modelsmodels
3.b
If you are using clearml-serving then check the docs / readme, but in a nutshell yes you can.
If the inference code is batchprocessing, which means a Task, then of course you can and lauch it, check the clearml agent f...
Was trying to figure out how the method knows that the docker image ID belongs to ECR. Do you have any insight into that?
Basically you should have the docker service login before running the agent, then the agent uses docker to run the image from the ECR.
Make sense ?
because fastaiβs tensorboard doesnβt work in multi gpu
keep me posted when this is solved, so we can also update the fastai2 interface,
What's the output_uri
you are passing ?
And the OS / Python version?
I think this is due to the label map including some keys with aΒ
.
Β in them.
Hi TenseOstrich47 what do you mean "label"
But adding a simpleΒ
force_download
Β flag to theΒ
get_local_copy
That's sounds like a good idea
BattyLion34 I have a theory, I think that any Task on the "default" queue qill fail if a Task is running on the "service" queue.
Could you create a toy Task that just print "." and sleeps for 5 seconds and then prints again.
Then while that Task is running, from the UI launch the Task that passed on the "default" queue. If my theory holds it should fail, then we will be getting somewhere π
Also, how do pipelines compare here?
Pipelines are a type of Task, so like Tasks you can clone and enqueue them, or set them as the target of the trigger.
the most flexible solution would be to have some way of triggering the execution of a script in the parent task environment,
This is the exact idea of the TriggerScheduler None
What am I missing here?