I have an idea, can you try with:task = Task.init(..., reuse_last_task_id=False)
I have a suspicion it starts the Tasks in parallel, and the "reuse_last_task_id" causes them to "reuse the same task locally" which makes them overwrite the configuration of one another.
What's strange is that the remote jobs, as soon as they are launched, if I compare their configs while in state pending, they have the right all different configs, but later, while running,
Wait I think I found it, since usuallyu the case with hydra you configure everything from overrides / config, when launched remotely it looks at it by default. But with the launch plugin it should be overwritten with the Task
` task = Task.init(...)
task.set_parameter(name="Hydra/_allow_omegaconf_ed...
It looks like the tag being used is hardcoded to 1.24-18. Was this issue identified and fixed in later versions?
BoredHedgehog47 what do you mean by "hardcoded 1.24-18" ? tag to what I think I lost context here
Are you suggesting the default "ubuntu:18.04" is somehow contaminated ?
This is an official Ubuntu container (nothing to do with ClearML), this is Very Very odd...
So as you say, it seems hydra kills these
Hmm let me check in the code, maybe we can somehow hook into it
AttractiveCockroach17 can I assume you are working with the hydra local launcher ?
I understand I can change the docker image for a component in the pipeline, but for the
it isnโt possible.
you can always to Task.current_task.connect()
from the pipeline function itself, to connect more configuration arguments you basically add via the function itself, all the pipeline logic function arguments become pipeline arguments, it's kind of neat ๐ regrading docker, the idea is that you use a very basic python docker (the default for services) queue for all...
I was expecting the remote experiment to behave similarly, why do I need to import pandas there?
The only problem os that the remote code did not install pandas
, once the package is there we can read the artifacts
(this is in contrast to the local machine where pandas is installed and so we can create/read the object)
Does that make sense ?
wouldn't it be possible to store this information in the clearml server so that it can be implicitly added to the requirements?
I think you are correct, and if we detect that we are using pandas to upload an artifact, we should try and make sure it is listed in the requirements
(obviously this is easier said than done)
And if instead I want to force "get()" to return me the path (e.g. I want to read the csv with a library that is not pandas) do we have an option for that?
Yes, c...
We do upload the final model manually.
wait you said upload manually, and now you are saying "saved automatically", I'm confused.
Using the dataset.create command and the subsequent add_files, and upload commands I can see the upload action as an experiment but the data is not seen in the Datasets webpage.
ScantCrab97 it might be that you need the latest clearml
package installed on the client end (as well as the new server with the UI)
What is your clearml package version ?
SolidSealion72 this makes sense, clearml deletes artifacts/models after they are uploaded, so I have to assume these are torch internal files
Martin, if you want, feel free to add your answer in the stackoverflow so that I can mark it as a solution.
Will do ๐ give me 5
Done ๐
HurtWoodpecker30 could it be you hit a limit of some sort ?
Hmm so yes that is true, if you are changing the bucket values you will have to manually also adjust it in grafana. I wonder if there is a shortcut here, the data is stored in Prometheus, and I would rather try to avoid deleting old data, Wdyt?
it would be clearml-serverโs job to distribute to each user internally?
So you mean the user will never know their own S3 access credentials?
Are those credentials unique per user or once"hidden" for all of them?
ComfortableShark77 it seems the clearml-serving is trying to Upload data to a different server (not download the model)
I'm assuming this has to do with the CLEARML_FILES_HOST, and missing credentials. It has nothing to do with downloading the model (that as you posted, will be from the s3 bucket).
Does that make sense ?
GiganticTurtle0
I think that what you are looking for is:param_dict = {'key': 1234} task.connect(param_dict, name='general')
Notice that when this code runs manually (i.e. not by the agent), the dict is stored on "general" parameter section of the Task.
But when the code is executed by the Agent, the opposite happens and the parameters from the "general" section of the Task or put back into the param_dict
, here the casting is done based on the type of the original values.
Generall...
Then in theory (since the backend is python based) you just need to find a base docker image to build it on.
The log is missing, but the Kedro logger is print toย sys.stdout in my local terminal.
I think the issue night be it starts a new subprocess, and that subprocess is not "patched" to capture the console output.
That said if an agent is running the entire pipeline, then everything is logged from the outside, so whatever is written to stdout/stderr is captured.
Thanks you for noticing the issue!
MagnificentPig49 that's a good question, I'll ask the guys ๐
BTW, I think the main issues is actually making sure there is enough documentation on how to compile it...
Anyhow I'll update here
Hi @<1631102016807768064:profile|ZanySealion18>
sorry missed that one
The cache doesn't work, it attempts to download the dataset every time.
just making sure the dataset itself contains all the files?
Once I used clearml-data add --folder * CLI everything works correctly (though all files recursively ended up in the root, I had luck all were named differently).
Not sure I follow here, is the problem the creation of the dataset of fetching it? is this a single version or multi...
For example, for some of our models we create pdf reports, that we save in a folder in the NFS disk
Oh, why not as artifacts ? at least you will be able to access from the web UI, and avoid VFS credential hell ๐
Regrading clearml datasets:
https://www.youtube.com/watch?v=S2pz9jn26uI
Because it lives behind a VPN and github workers donโt have access to it
makes sense
If this is the case, I have to admit that combining offline-mode and remote execution makes sense, no?