
Reputation
Badges 1
25 × Eureka!Weird ?!, I see this in the code:
https://github.com/allegroai/clearml/blob/382d361bfff04cb663d6d695edd7d834abb92787/clearml/automation/controller.py#L2871
Hmmm could you attach the entire log?
Remove any info that you feel is too sensitive :)
store_code_diff_from_remote
Β don't seem to change anything in regards of this issue
Correct, it is always from remote
i'll be using the update_task, that worked just fine, thanksΒ
Β (edite
Sure thing.
ShakyJellyfish91 , I took a quick look at the diff between the versions can you hack a non working version (preferably the latest) and verify the issue for me?
Hi IrateBee40
What do you have in your ~/clearml.conf
?
Is it pointing to your clearml-server ?
I can but that is not a configuration we would want to run with in production
Agreed, I just want to isolate the issue. I think this is the bottom python interface missing some configuration or environment variables
No worries, just found it. Thanks!
I'll make sure to followup on the GitHub issue for better visibility π
Thanks GorgeousMole24
That is a very good point! passing to product guys
Basically create a token and use it as user/password
EDIT:
With read-only permissions π
Yes. Because my old
has never been resolved (though closed), we use the dataset object to upload e.g. local files needed for remote execution.
Ohh No I remember... following this line, can I assume these files are reused, i.e. this is not a "per instance" . I have to admit that I have a feeling this is a very unique usecase. and Maybe the "old" way Dataset were shown is better suited ?
No, I mean why does it show up in the task view (see attached image), forcing me to clic...
Hi SlimyRat21 :
Tool that will help me track and manage the different configs and simulation logs across different runs and versions of the simulation.
Definitely covered by Trains, it also does that with very little code changes (if any) to your current code base
Tool that will help me gather and compare the results from specific simulation runs
Same as above π
Do you you have an experience or tips on using trains for non-ML before investing time into this and seeing...
Sounds good.
BTW, when the clearml-agent is set to use "conda" as package manager it will automatically install the correct cudatoolkit on any new venv it creates. The cudatoolkit version is picked direcly when "developing" the code, assuming you have conda installed as development environment (basically you can transparently do end-to-end conda, and not worry about CUDA at all)
So could it be that pip install --no-deps .
is the missing issue ?
what happens if you add to the installed packages "/opt/keras-hannd" ?
Hi MelancholyChicken65
I'm not sure you an control it, the ui deduces the URL based on the address you are browsing to: so if you go yo http://app.clearml.example.com you will get the correct ones, but you have to put them on the right subdomains:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#subdomain-configuration
What do you mean by a custom queue ?
In the queues page you have a plus button, this will just create a new queue
And can you see your promethues in your grafana?
Hi AdventurousRabbit79
In the wizard
https://github.com/allegroai/clearml/blob/1ab3710074cbfc6a19dd8a57078b10b31b2df31a/examples/services/aws-autoscaler/aws_autoscaler.py#L214
Add the S3 section like you would in the clearml.conf:
https://github.com/allegroai/clearml/blob/1ab3710074cbfc6a19dd8a57078b10b31b2df31a/docs/clearml.conf#L73
I see..
Generally speaking If that is the case, I would think it might be better to use the docker mode, it offers way more stable environment, regardless on the host machine runinng the agent. Notice there is no need to use custom containers, as the agent will basically run the venv process, only inside a container, allowing you to reuse offf the shelf containers.
If you were to add this, where would you put it? I can use a modified version ofΒ
clearml-agent
Yep, that would b...
Okay, some progress, so what is the difference ?
Any chance the issue can be reproduced with a small toy code ?
Can you run the tqdm loop inside the code that exhibits the CR issue ? (maybe some initialization thing that is causing it to ignore the value?!)
Hi JitteryCoyote63
I think there is a GitHub issue (request on it), this is not very trivial to build (basically you need the agent to first temporary pull the git, apply changes, build docker, remove temp build, and restart with the new image)
Any specific reason for not pushing a docker, or using the extra docker bash script on the Task itslef?
create a new file, copy paste to the new file these lines, and run it inside vscode, what are you getting in the console?
from clearml import Task Task.add_requirements("tensorflow") task = Task.init(project_name="debug", task_name="requirements") print("done")
An easier fix for now will probably be some kind of warning to the user that a task is created but not connected
That is a good point, maybe if you do not have a "main" Task, then we print the warning (with some flag to disable the warning) ?
PompousParrot44 unfortunately not yet π
But the gist is :
MongoDB stores experiment data (i.e. execution parameters, git ref etc.)
ElasticSearch stores results (i.e. metrics console logs, debug image links etc.)
Does that help?
Hi FloppyDeer99
Since this thread is a bit old, I might have missed something π
Are we saying the links are not working in the UI ?
(notice the links themselves are generated by the clearml package, so if there was a bug, still not sure here, then old links will remain invalid until manually fixed) Can you verify that the latest clearml generates working links?
would I have to execute each task in the pipeline locally(but still connected to trains),
Somehow you have to have the pipeline step Task in the system, you can import it from code, or you can run it once, then the pipeline will clone it and reuse it. Am I missing something ?
I did nothing to generate a command-line. Just cloned the experiment and enqueued it. Used the server GUI.
Who/What created the initial experiment ?
I noticed that if I run the initial experiment by "python -m folder_name.script_name"
"-m module" as script entry is used to launch entry points like python modules (which is translated to "python -m script")
Why isn't the entry point just the python script?
The command line arguments are passed as arguments on the Args section of t...
Hi @<1610083503607648256:profile|DiminutiveToad80>
Yes, it does. They are also cached by default (on the machine with the agent)
None
Uninstall the current clearml-agent and reinstall this wheel, I hacked it to have ==, let's see if that works