Reputation
Badges 1
2 × Eureka!Are you running a self-hosted/enterprise server or on app.clear.ml? Can you confirm that the field in the screenshot is empty for you?
Or are you using the SDK to create an autoscaler script?
I agree, I came across the same issue too. But your post helps make it clear, so hopefully it can be pushed! 🙂
Also, the answer to blocking on the pipeline might be in the .wait()
function: https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#wait-1
TimelyPenguin76 I can't seem to make it work though, on which object should I run the .wait()
method?
With what error message did it fail? I would expect it to fail, because you finalized this version of your dataset by uploading it 🙂 You'll need a mutable copy of the dataset before you can remove files from it I think, or you could always remove the file on disk and create a new dataset with the uploaded one as a parent. In that way, clearml will keep track of what changed in between versions.
Indeed that should be the case. By default debian is used, but it's good that you ran with a custom image, so now we know it's not clear that more permissions are needed
Great! Please let me know if it works when adding this permission, we'll update the docs in a jiffy!
That wasn't my intention! Not a dumb question, just a logical one 😄
Oohh interesting! Thanks for the minimal example though. We might want to add it to the docs as an example of dynamic DAG creation 🙂
Well I'll be had, you're 100% right, I can recreate the issue. I'm logging it as a bug now and we'll fix it asap! Thanks for sharing!!
Hello!
What is the usecase here, why would you want to do that? If they're the same dataset, you don't really need lineage, no?
Can you share the exact error message? That will help a ton!
Hi ThoughtfulGrasshopper59 !
You're right, we should probably add the convenient allow_archived
function in .get_task
s
()
as well.
That said, for now this can be a workaround:
` from clearml import Task
print([task.name for task in Task.get_tasks(
project_name="TAO Toolkit ClearML Demo",
task_filter=dict(system_tags=['archived'])
)]) Specifically
task_filter=dict(system_tags=['archived']) ` should be what you need.
If I'm not mistaken:
Fileserver - Model files and artifacts
MongoDB - all experiment objects are saved there.
Elastic - Console logs, debug samples, scalars all is saved there.
Redis - caching regarding agents I think
Hi @<1533257278776414208:profile|SuperiorCockroach75> , the clearml experiment manager will try to detect your package requirements from its original environment. Meaning that if you run the code and it imports e.g. SQLAlchemy, then it will log the exact version of SQLAlchemy you have installed locally.
When you run only get_data,py
locally and have the experiment manager track it, can you then look at the task that is made in the clearml webUI and check the installed packages section? ...
It depends on how complex your configuration is, but if config elements are all that will change between versions (i.e. not the code itself) then you could consider using parameter overrides.
A ClearML Task can have a number of "hyperparameters" attached to it. But once that task is cloned and in draft mode, one can EDIT these parameters and change them. If then the task is queued, the new parameters will be injected into the code itself.
A pipeline is no different, it can have pipeline par...
Hi @<1523701949617147904:profile|PricklyRaven28> sorry that this is happening. I tried to run your minimal example, but get a IndexError: Invalid key: 5872 is out of bounds for size 0
error. That said, I get the same error without the code running in a pipeline. There seems to be no difference between simply running the code and the pipeline (for me). Do you have an updated example, maybe also including getting a local copy of an artifact, so I can check?
Hi! Have you run clearml-serving create ...
first? Usually you'd make what's called a "control plane task" first, that will hold all your configuration. Step 4 in the initial setup instructions is where you'll find it!
I'm sorry, but I will need more context. Where exactly is this log from? Can you confirm you're working with a self-hosted open source server? Which container/microservices is giving you this last error message?
Hi Alejandro! I'm running the exact same Chromium version, but haven't encountered the problem yet. Are there specific parameter types where it happens more often?
Do you have a screenshot of what happens? Have you checked the console when pressing f12?
Could you use tags for that? In that case you can easily filter on which group you're interested in, or do you have a more impactful UI change in mind to implement groups? 🙂
I tried answering them as well, let us know what you end up choosing, we're always looking to make clearml better for everyone!
effectively making us lose 24 hours of GPU compute
Oof, sorry about that, man 😞
That's what happens in the background when you click "new run". A pipeline is simply a task in the background. You can find the task using querying and you can clone it too! It is places in a "hidden" folder called .pipelines
as a subfolder on your main project. Check out the settings, you can enable "show hidden folders"
Just to be sure I understand you correctly: you're saving/dumping an sklearn model in the clearml experiment manager, then want to serve it using clearml serving, but you do not wish to specify the model input and ouput shapes in the CLI?
Hi @<1523701949617147904:profile|PricklyRaven28> just letting you know I still have this on my TODO, I'll update you as soon as I have something!
Also, this might be a little stupid sorry, but your torch save command saves the model in the current folder, whereas you give clearml the 'model_folder/model' path instead. Could it be that the path is just incorrect?
For the record, this is a minimal reproducible example:
Local folder structure:
` ├── remove_folder
│ ├── batch_0
│ │ ├── file_0_0.txt
│ │ ├── file_0_1.txt
│ │ ├── file_0_2.txt
│ │ ├── file_0_3.txt
│ │ ├── file_0_4.txt
│ │ ├── file_0_5.txt
│ │ ├── file_0_6.txt
│ │ ├── file_0_7.txt
│ │ ├── file_0_8.txt
│ │ └── file_0_9.txt
│ └── batch_1
│ ├── file_1_0.txt
│ ├── file_1_1.txt
│ ├── file_1_2.txt
│ ├── file_1_3.txt
│ ├── fi...
Isitdown seems to be reporting it as up. Any issues with other websites?
@<1523701949617147904:profile|PricklyRaven28> Please use this patch instead of the one previously shared. It excludes the dict hack :)