Happens pretty much consistently across all our projects -
Have a project with over 15 tasks (i.e. one that needs the Load More button) Click Load More, select a task that's not in the first 15 Let the page "rest" for a while (a couple of hours) Flip back to the page - the task is still active, but you cannot see it in the task list and there is no more Load More button
Each user creates a .env file for their needs or exports them in the shell running the python code. Currently I copy the environment variables to an S3 bucket and download it from there.
Is there some default Docker image you ship with ClearML that you'd recommend, or can/should we use our own? 🙂
That is, we have something like:
` task = Task.init(...)
ds = Dataset.create(dataset_name=task.name, dataset_project=task.get_project_name(), use_current_task=True)
upload files
dataset.upload(show_progress=True)
dataset.finalize()
do stuff with task and dataset
task.close() `But because the dataset is linked to the task, the task is then moved and effectively becomes invisible 😕
Any thoughts AgitatedDove14 SuccessfulKoala55 ?
I also tried setting agent.python_binary: "/usr/bin/python3.8" but it still uses Python 2.7?
I'll see if we can do that still (as the queue name suggests, this was a POC, so I'm trying to fix things before they give up 😛 ).
Any other thoughts? The original thread https://clearml.slack.com/archives/CTK20V944/p1641490355015400 suggests this PR solved the issue
Btw TimelyPenguin76 this should also be a good starting point:
First create the target directory and add some files:sudo mkdir /data/clearml sudo chmod 777 -R /data/clearml touch /data/clearml/foo touch /data/clearml/bar touch /data/clearml/bazThen list the files using the StorageManager. It shouldn't take more than a few miliseconds.` from clearml import StorageManager
%%timeit
StorageManager.list("/data/clearml")
-> 21.2 s ± 328 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) `
At least as far as I can tell, nothing else has changed on our systems. Previous pip versions would warn about this, but not crash.
SuccessfulKoala55 The changelog wrongly cites https://github.com/allegroai/clearml/issues/400 btw. It is not implemented and is not related to being able to save CSVs 😅
Hey @<1523701070390366208:profile|CostlyOstrich36> , thanks for the reply!
I’m familiar with the above repo, we have the ClearML Server and such deployed on K8s.
What’s lacking is documentation regarding the clearml-agent helm chart. What exactly does it offer, etc.
We’re interested in e.g. using karpenter to scale our deployments per demand, effectively replacing the AWS autoscaler.
It also happens when use_current_task=False though. So the current best approach would be to not combine the task and the dataset?
Since this is a single process, most of these are only needed once when our "initializer" task starts and loads.
Yeah I figured (2) would be the way to go actually 😄
Say I have Task A that works with some dataset (which is not hard-coded, but perhaps e.g. self-defined by the task itself).
I'd now like to clone Task A and modify some stuff, but still use the same dataset (no need to recreate it, but since it's not hard-coded, I have to maintain a reference somewhere to the dataset ID).
Since the Dataset SDK offers use_current_task , I would have also expected there to be something like dataset.link(task) or task.register_dataset(ds) ...
The odd thing is that it was already defined, and then when I clicked an S3 link, it asked me to fill it in again, adding a duplicate credentials row
After setting the sdk.development.default_output_uri in the configs, my code kinda looks like:
` task = Task.init(project_name=..., task_name=..., tags=...)
logger = task.get_logger()
report with logger freely `
Basically you have the details from the Dataset page, why should it be mixed with the others ?
Because maybe it contains code and logs on how to prepare the dataset. Or maybe the user just wants increased visibility for the dataset itself in the tasks view.
why would you need the Dataset Task itself is the main question?
For the same reason as above. Visibility and ease of access. Coupling relevant tasks and dataset in the same project makes it easier to understand that they're...
AgitatedDove14
I'll make a PR for it now, but the long story is that you have the full log, but the virtualenv version is not logged anywhere (the usual output from virtualenv just says which Python version is used, etc).
We can change the project name’s of course, if there’s a suggestion/guide that will make them see past the namespace…
I’d like to refrain from manually specifying the dependencies, since it adds a lot of overhead to extend
I've tried also e.g. setting gent.package_manager.priority_packages = ["poetry"] , and/or agent.package_manager.poetry_version = ">1.2.0" , and other flags, but these affect only the main /clearml_agent_venv environment, and not the one actually generated by the clearml-agent when executing the task
I'm not too worried about the dataset appearing (or not) in the Datasets tab. I would like it (the original task ) to to not disappear from the original project I assigned it to
I'm not entirely sure I understand the flow but I'll give it a go. I have two final questions:
This seems to only work for a single file (weights_path implies a single file, not multiple ones). Is that the case? Why do you see this as preferred to the dataset method we have now? 🤔
It's not exactly "debugging", but rather a description of the generated model/framework (generated with pygraphviz).
I guess it does not do so for all settings, but only those that come from Session()
Opened a matching feature request issue for this -> https://github.com/allegroai/clearml/issues/418
I'm not sure how the decorators achieve that; from the available examples and trials I've done, it seems that:
- Components anyway need to be available when you define the pipeline controller/decorator, i.e. same codebase
- The component code still needs to be self-composed (or, function component can also be quite complex)
- Decorators do not allow any dynamic build, because you must know how the component are connected at decoration time
With that said, it could be that the provided example...
Also, creating from functions allows dynamic pipeline creation without requiring the tasks to pre-exist in ClearML, which is IMO the strongest point to make about it
Yes, using this extra_clearml_conf parameter you can add configuration
This is again exposing the environment variables on the WebUI for everyone to see.
The idea was to specify just the names of the environment variables, and that those would be exposed automatically to the EC2 instance, without specifying what values they should have (the value is taken from the agent running the scaler)