Reputation
Badges 1
29 × Eureka!it might be an issue in the UI due to this unconventional address or network settings
I think this is related to an https://github.com/allegroai/clearml-server/issues/112#issue-1149080358 that seems to be a reoccurring issue across many different setups
task.get_parameters
and task.get_parameters_as_dict
have the keyword argument cast
which attempts to convert values back to their original type, but interestingly doesn't seem to work for properties:
` task = Task.init()
task.set_user_properties(x=5)
task.connect({"a":5})
task.get_parameters_as_dict(cast=True)
{'General': {'a': 5}, 'properties': {'x': '5'}} Hopefully would be a relatively easy extension of
get_user_properties ` !
Could well be the same as https://github.com/allegroai/clearml-server/issues/112 which is also discussed https://clearml.slack.com/archives/CTK20V944/p1648547056095859 🙂
CumbersomeCormorant74 just to confirm in my case the file's aren't actually deleted - I have to manually delete them from the fileserver via a terminal
Thanks CumbersomeCormorant74
I think a note about the fileserver should be added to the https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_security page!
Yep GCP. I wonder if it's something to do with Container-Opimized OS, which is how I'm running the agents
I ran into something similar, for me I'd actually cloned the repository using the address without the git@
(something made it work). ClearML read it from the remote repository URL and used it. When I updated the URL of the remote repository in my git client it then worked.
connect_configuration
seems to take about the same amount of time unfortunately!
Another option would be to dotask.close() task.reset()
And then execute an agent to pick up that task, but I don’t think reset
is part of the public API. Is this risky?
I guess two more straightforward questions:
Could it be made possible for task.execute_remotely(clone=False, exit_process=False)
to not raise an exception? Im happy work on a PR if this would be possible Is there any issue to having task.reset()
in the public API/is there any potential issues with using?
Ah right, nice! I didn’t think it was as I couldn’t see it in the Task
reference , should it be there too?
And here is a PR for the other part.
Hi CostlyOstrich36 thanks for the response and makes sense.
What sort of problems could happen, would it just be the corruption of the data that is being written or could it be more breaking?
For context, I’m currently backing up the server (spinning it down) every night but now need to run tasks over night and don’t want to have any missed logs/artifacts when the server is shutdown.
Ok, thanks Jake!
And regarding the first question - Edit your
~/clearml.conf
That would change what file server is used by me locally or an agent yes, but I want to change what is shown by the GUI so that would need to be a setting on the server itself?
CostlyOstrich36 thanks for getting back to me!
yes!
That's great! Please can you let me know how to do it/how to set the default files server?
However it would be advisable to also add the following argument to your code :
That's useful thanks, I didn't know about this kwarg
I think you should open a github feature request since there is currently no way to do this via UI
Will do. Is there a way to do it no via the UI? E.g. in the server configuration (I'm running a self hosted server)?
When you generate new credentials in the GUI, it comes up with a section to copy and paste into either clearml-init
or ~/clearml.conf
. I want the files server displayed here to be a GCP address
And what is the difference in behaviour betweenTask.init(..., output_uri=True)
and Task.init(..., output_uri=None)
?
Thanks @<1523701087100473344:profile|SuccessfulKoala55> , I’ve taken a look and is this force merging you’re referring to? Do you know how often ES is configured to merge in clearml server?
Shards that I can see are using a lot of disk space are
events-training_stats_scalar
events-log
- And then various
worker_stats_*
Tasks are running locally and recording to our self deployed server, no output in my task log that indicates an issue. This is all of the console output:
2023-01-09 12:53:22 ClearML Task: created new task id=7f94e231d8a04a8c9592026dea89463a ClearML results page:
2023-01-09 12:53:24 ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
Are there any logs in the server I can check? The server is running v1.3.1 and the issue I’m see is with version 1....