Reputation
Badges 1
25 × Eureka!I see, by default it will look for requirements.txt in the root of the repo (the actual repo).
That said in code you can specify the requirements .txt:Task.force_requirements_env_freeze(requirements_file='repo/project-a/requirements.txt') task = Task.init(...)
Notice, you need to call it prior to the Task.init call
Hi @<1726410010763726848:profile|DistinctToad76>
Why not just report scalars, the x-axis you can use as "iterations" if this is a running in real time to collect the prompts.
If this is a summary then just report a scatter plot (you can also specify the names of the axis and the series)
None
WittyOwl57 could it be the EC2 instance is too small (i.e. not enough storage / memory) ?
Hi DrabCockroach54
This seems like a pip issue trying to install from source, try upgrading the pip version and before installing numpy, it should solve it 🤞
it works if I run the same command manually.
What do you mean?
Can you do:docker run -it <my container here> bash
Then immediately get an interactive bash ?
Hi JitteryCoyote63 ,
upload_artifacts was designed to upload pre made artifacts, which actually covers everything.
With register_artifacts we tried to have something that will constantly log PD artifact, the use case was examples used for training and their order, so we could compare the execution of two different experiments and detect dataset contamination etc.
Not Sure it is actually useful though ...
Retrieving an artifact from a Task is done by:
` Task.get_task(task_id='aaa').artifact...
Hi AstonishingSwan80 , what do you mean by "ec2 API"?
HealthyStarfish45 the pycharm plugin is mainly for remote debugging, you can of course use it for local debugging but the value is just to be able to configure your user credentials and trains-server.
In remote debbugging, it will make sure the correct git repo/diff are stored alongside the experiment (this is due to the fact that pycharm will no sync the .git folder to the remote machine, so without the plugin Trains will not know the git repo etc.)
Is that helpful ?
Hi HealthyStarfish45
- is there an advantage in using tensorboard over your reporting?
Not unless your code already uses TB or has some built in TB loggers.
html reporting looks powerfull, can one inject some javascript inside?
As long as the JS is self contained in the html script, anything goes :)
there was a problem with index order when converting from pytorch tensor to numpy array
HealthyStarfish45 I'm assuming you are sending numpy to report_image (which makes sense) if you want to debug it, you can also test tensorboard add_image or matplotlib imshow. both will send debug images
Okay, make sure that in your trains.conf
on all the trains-agent machine you add the following:agent.extra_docker_arguments: ["-v", "/etc/hosts:/etc/hosts",]
Yes, the left side is the location of the file on the host machine, the right side is the location of the file inside the docker. in our case it is the same location
SuperiorDucks36 , is the domain name "rz-s-git" this does not seem like a valid domain?
EDIT:
Is it a local domain on your network?
what do you have in the trains-agent machine in "/etc/host"
This will mount the trains-agent machine's hosts file into the docker
Is trains-agent using docker-mode or virtual-env ?
Hi SuperiorDucks36
Could you post the entire log?
(could not resolve host seems to be coming from the "git clone" call).
Are you able to manually clone the repository on the machine running trains-agent
Please attach the log 🙂
hmm I assume the reason is the cookie / storage changed?
Hi VivaciousWalrus99
Could you attach the log of the run ?
By default it will use the python it is running with.
Any chance the original experiment was executed with python2 ?
LudicrousParrot69
I "think" I have a better handle on what you wish to do.
Is it kind of generic "serving" solution?
FYI:
Model artifact is, usually, a weights/model file. The idea that later you will be able to access it and serve it. Now the problem is (and I think this is what you are referring to) there is usually a specific piece of code tied to that model that can use it (a.k.a pyfunc)
A few ideas:
These days everyone is trying to build their models with generic interface, so that scik...
Hi LudicrousParrot69
I guess you are right this is not trivial distinction:
min: means we are looking for the the minimum value of a specific scalar. meaning 1.0, 0.5, 1.3 -> the optimizer will get these direct values and will optimize based on that
global min: means the optimizer is getting the minimum values of the specific scalar. With the same example: 1.0, 0.5, 1.3 -> the HPO optimizer gets 1.0, 0.5, 0.5
The same holds for max/global_max , make sense ?
Correct, which makes sense if you have a stochastic process and you are looking for the best model snapshot. That said I guess the default use case would be min/max (and not the global variant)
Found it, definitely a bug in the callback, it has not effect on the HPO process itself
... grab the model artifacts for each, put them into the parent HPO model as its artifacts, and then go through the archive everything.
Nice. wouldn't it make more sense to "store" a link to the "winning" experiment. So you know how to reproduce it, and the set of HP that were chosen?
No that the model is bad, but how would I know how to reproduce it, or retrain when I have more data etc..
LudicrousParrot69 I would advise the following:
Put all the experiments in a new project Filter based on the HPO tag, and sort the experiments based on the metric we are optimizing (see adding custom columns to the experiment table) And select + archive the experiments that are not usedBTW: I think someone already suggested we do the auto archiving inside the HPO process itself. Thoughts ?
Doesnt solve the issue if a HPO run is going to take a few days
The HPO Task has a table of the top performing experiments, so when you go to the "Plot" tab you get a summary of all the runs, with the Task ID of the top performing one.
No need to run through the details of the entire experiments, just look at the summary on the HPO Task.
Bugs, definitely GitHub, this is the easiest to track.
Documentation, if these are small issues, Slack is fine, otherwise, GitHub issue.
Regrading the documentation, we are working on another iteration of improvement, but if you find inaccuracies/broken links please report 🙂