
Reputation
Badges 1
25 × Eureka!Hmm, conda_freeze
in the clearml.conf on the development machine ?
check if the fileserver docker is running with docker ps
Hi FunnyTurkey96
Let me check what's the status here
(BTW: Is this for a specific Task or for a specific Project?)
Hi @<1795626098352984064:profile|SoggyElk61>
Where you able to pass the ClearMLVisBackend
line in your code?
This needs to be added before your actual code
WackyRabbit7 I'll make sure it is fixed
Failing when passing the diff to the git command...
Hi @<1673501379764686848:profile|VirtuousSeaturtle4>
What I dont get is that the example does not refer to a bucket path. What bucket path should I specify ?
you mean to store data?
Hi MelancholyChicken65
I'm assuming you need ssh protocol not https user/token, set this one to true 🙂force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/76c533a2e8e8e3403bfd25c94ba8000ae98857c1/docs/clearml.conf#L39
ElegantCoyote26 I don't think Keras logs it anywhere unless you have TB, so nowhere to take the data from...
In short, yes you have to have TB :)
Parent makes sense if you are changing the data of the parent version, but some data is preserved. Which will make the delta-based storage only store the diff.
If everything is different, and you call sync
for example, then it will not reference any previous "snapshot", so there will be no redundancy in storage, but you still get a pointer to the "parent" version.
Make sense ?
great 🙂
two things:
I'm not sure argparse supports dict as a type (I mean it will take anything but I'm not sure it will parse your arguments as dict) I know there was an issue with argparsing, but I think it was solvedbtw: Basically the way clearml-agent works, it does not actually pass the arguments in commandline but directly to the argparser at runtime
What happens if you clone the Task (the one with Args showing and without the explicit task.connect(_args)
and send it to the age...
Hi @<1600299043865497600:profile|MagnificentSeaurchin90>
Any chance you can provide more info on the error?
if I want to compare two experiments the scalar plots do not load ( loading forever ).
I'm assuming the issue is the Plots tab? or is it the Scalars? what do you have in the Plots? can you send an image of the single experiment ?
I'm not sure TB support confusion matrix regardless, from anywhere in your code you can do:from trains import Task Task.current_task().get_logger().report_confusion_matrix(...)
Hi JitteryCoyote63
Wait a few hours, there is a new fix, I'll make sure we upload it later today (scheduled to be there anyhow, I'll push it forward)
LOL @<1545216070686609408:profile|EnthusiasticCow4>
I assume this is a hidden folder?
for example datasets are hidden folders that can be viewed if you go to the settings page and turn on "show hidden folders"
I'm not sure about the intended use of
connect_configuration
now.
Basically here is the rationale behind it:
I have a config file that I want to log on the Task, and I Also want to be able to change this configuration file externally when launching using an agent (i.e. edit the content) I have a nested dictionary that I do not want to flatten and push as hyper-parameters because it is not very readble, so I want to store it in a more human readable form and edit it a...
in order to work with ssh cloning, one has to manually install openssh-client to the docker image, looks like that
Correct, you have to have SSH inside the container so that git can use it.
You can always install with the following setup inside your agent's clearml.conf:extra_docker_shell_script: ["apt-get install -y openssh-client", ]
https://github.com/allegroai/clearml-agent/blob/73625bf00fc7b4506554c1df9abd393b49b2a8ed/docs/clearml.conf#L145
Just making sure i understand, you are to upload your models with clearml to the Yandex compatible s3 storage?
t seems there is some async behavior going on. After ending a run, this prompt just hangs for a long time:
2021-04-18 22:55:06,467 - clearml.Task - INFO - Waiting to finish uploads
And there's no sign of updates on the dashboard
Hmm that could point to an issue uploading the last images (which are larger than regular scalars) could you try flushing and waiting ?
i.e.task.flush() sleep(45)
Like, if you google "dagster and clearml" or "prefect and clearml" or "airflow and clearml" -- I don't find any blogs written by people talking about how they use both of them together.
Oh yeah I see your point, I think the main reason is a lot of the dag capabilities and the orchestration is already folded into clearml's capabilities (i.e. pipelines + clearml-agent etc.)
That said I'm pretty sure I have seen just adding Task.init into each of a the framework above steps, in order to t...
JitteryCoyote63 see if upgrading the packages as they suggest somehow fixes it.
I have the feeling this is the same problem (the first error might be trains masking the original error)
It just seems frozen at the place where it should be spinning up the tasks within the pipeline
And is there an agent for those ? usually there is one agent for running logic tasks (like pipelines) running with --services-mode
which means multiple Tasks can be executed by the same agent. And other agents for compute Tasks that are a signle Task per agent (but you can run multiple agents on the same machine)
DepressedChimpanzee34
I am actually curious now, why is the default like this? maybe more people are facing similar bottlenecks?
On "regular" load there is no need for multiple processes, and the memory consumption might be more important than reply lag (at least before you start to scale)
DisturbedWalrus17
By spawning multiple processes for the API server, it looks like we utilise the CPU more now but the UI and API calls are still lagging a lot
Can you try with even more ...
I think I'm missing the connection between the hash-ids and the txt file, or in other words why is the txt file containing full path not relative path
(But in venv mode is also hangs the same way)
Hmm this is strange, could it be you are running out of storage ?
actually the issue is that the packages are not being detected 😞
what happens if you do the following?Task.add_requirements("tensorflow") task = Task.init(...)
an implementation of this kind is interesting for you or do you suggest to fork
You mean adding a config map storing a default trains.conf for the agent?
- ...that file and the logs of the agent service always say the same thing as before:
Oh in that case you need feel in Your credentials here:
https://github.com/allegroai/clearml-server/blob/5de7c120621c2831730e01a864cc892c1702099a/docker/docker-compose.yml#L137
Basically CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY will let the agent running inside the docker talk to the server itself. Just put your own credentials there as a start, it should solve the issue