Reputation
Badges 1
25 × Eureka!Okay, could you try to run again with the latest clearml package from github?pip install -U git+
Hmm let me check something
Okay, I think this might be a bit of an overkill, but I'll entertain the idea 🙂
Try passing the user as key, and password as secret?
Hmm should not make a diff.
Could you verify it still doesn't work with TF 2.4 ?
After removing the task.connect lines, it encountered another error related to 'einops' that is not recognized. It does exist on my environment file but was not installed by the agent (according to what I see on 'Summary - installed python packages'. should I add this manually?
Yes, I'm assuming this is a derivative package that is needed by one of your packages?
Task.add_requirements("einops")
task = Task.init(...)
Let me check, it was supposed to be automatically aborted
is the base Task a file or a notebook ?
which to my understanding has to be given before a call to an argparser,
SmarmySeaurchin8 You can call argparse before Task.init, no worries it will catch the arguments and trains-agent
will be able to override them :)
SmarmySeaurchin8args=parse.parse() task = Task.init(project_name=args.project or None, task_name=args.task or None)
You should probably look at the docstring 😉
:param str project_name: The name of the project in which the experiment will be created. If the project does
not exist, it is created. If project_name
is None
, the repository name is used. (Optional)
:param str task_name: The name of Task (experiment). If task_name
is None
, the Python experiment
...
Regrading the project name:
set_project will support project_name in the next version 🙂 project_id=[p.id for p in Task.get_projects() if p.name==project_name][0]
(We should probably better state it in the GitHub readme)
LuckyRabbit93 We do!!!
If this is a simple two level nesting:
You can use the section name:task.connect(param['data'], name='data') task.connect(param['model'], name='model')
Would that help?
The comparison reflects the way the data is stored, in the configuration context. that means section name & key value (which is what the code above does)
BTW: if you make the right column the base line (i.e. move it to the left, you will get what you probably expected)
diff line by line is probably not useful for my data config
You could request a better configuration diff feature 🙂 Feel free to add to GitHub
But this also mean I have to first load all the configuration to a dictionary first.
Yes 😞
Hi LazyLeopard18
I remember someone deploying , specifically on the AZURE k8s (can't remember now how they call it).
What is exactly the feedback you are after?
Okay, I'll make sure we change the default image to the runtime flavor of nvidia/cuda
FranticCormorant35 As far as I understand what you have going is a multi-node setup, that you manage yourself. Something like Horovod Torch distributed or any MPI setup. Since Trains support all of the above standard multi-node. The easiest way is to do the following:
On the master Node set OS environment:OMPI_COMM_WORLD_NODE_RANK=0
Then on any client node:OMPI_COMM_WORLD_NODE_RANK=unique_client_node_number
In all processes you can Call Task.init - with all the automagic kicking in....
so if i plot image with matplot lib..it would not upload? i need use the logger.
Correct, if you have no "main" task , no automagic 😞
so how can i make it run with the "auto magic"
Automagic logs a single instance... unless those are subprocesses, in which case, the main task takes care of "copying" itself to the subprocess.
Again what is the use case for multiple machines?
Correct, and that also means the code the runs is not auto-magically logged.
CourageousLizard33 Are you using the docker-compose to setup the trains-server?
CourageousLizard33 so you have a Linux server running Ubuntu VM with Docker inside?
I would imagine that you could just run the docker on the host machine, no?
BTW, I think 8gb is a good recommendation for a VM it's reasonable enough to start with, I'll make sure we add it to the docs
CourageousLizard33 VM?! I thought we are talking fresh install on ubuntu 18.04?!
Is the Ubuntu in a VM? If so, I'm pretty sure 8GB will do, maybe less, but I haven't checked.
How much did you end up giving it?
Probably less secure though :)
Are you running inside a kubernetes cluster ?
And you want all of them to log into the same experiment ? or do you want an experiment per 60sec (i.e. like the scheduler)
Happy new year @<1618780810947596288:profile|ExuberantLion50>
- Is this the right place to mention such bugs?Definitely the right place to discuss them, usually if verified we ask to also add in github for easier traceability / visibility
m (i.e. there's two plots shown side-by-side but they're actually both just the first experiment that was selected). This is happening across all experiments, all my workspaces, and all the browsers I've tried.
Can you share a screenshot? is this r...